Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language

Apply

Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language

Journal Title: Informatics - Year 2019, Vol 6, Issue 2

Abstract

Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora. The proposed methods were evaluated on a dataset comprising of 162 Bangla word pairs, which were annotated by five expert raters. The correlation scores obtained between the four metrics and human evaluation scores demonstrate a marked enhancement that the cross-lingual approach brings into the process of semantic similarity calculation for Bangla.

Authors and Affiliations

Rajat Pandit, Saptarshi Sengupta, Sudip Kumar Naskar, Niladri Sekhar Dash and Mohini Mohan Sardar

Keywords

semantic similarity; Word2Vec; translation; low-resource languages; WordNet

Web-Scale Multidimensional Visualization of Big Spatial Data to Support Earth Sciences—A Case Study with Visualizing Climate Simulation Data

The world is undergoing rapid changes in its climate, environment, and ecosystems due to increasing population growth, urbanization, and industrialization. Numerical simulation is becoming an important vehicle to enhan...

Self-Adaptive Multi-Sensor Activity Recognition Systems Based on Gaussian Mixture Models

Personal wearables such as smartphones or smartwatches are increasingly utilized in everyday life. Frequently, activity recognition is performed on these devices to estimate the current user status and trigger automate...

Social Media Providing an International Virtual Elective Experience for Student Nurses

The advances in social media offer many opportunities for developing understanding of different countries and cultures without any implications of travel. Nursing has a global presence and yet it appears as though stud...

Scalable Interactive Visualization for Connectomics

Connectomics has recently begun to image brain tissue at nanometer resolution, which produces petabytes of data. This data must be aligned, labeled, proofread, and formed into graphs, and each step of this process requir...

When Wiki Technology Meets Corporate Knowledge Management Routines: A Sociomateriality Perspective

There seems to be an inherent tension between wiki affordances—open boundaries, unconstrained editing, and transparency—and traditional knowledge management (KM) routines used in firms. The objective of this study is t...

EP ID EP44180
DOI https://doi.org/10.3390/informatics6020019
Views 312
Downloads 0

How To Cite

Rajat Pandit, Saptarshi Sengupta, Sudip Kumar Naskar, Niladri Sekhar Dash and Mohini Mohan Sardar (2019). Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language. Informatics, 6(2), -. https://europub.co.uk/articles/-A-44180