Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language

Journal Title: Informatics - Year 2019, Vol 6, Issue 2

Abstract

Semantic similarity is a long-standing problem in natural language processing (NLP). It is a topic of great interest as its understanding can provide a look into how human beings comprehend meaning and make associations between words. However, when this problem is looked at from the viewpoint of machine understanding, particularly for under resourced languages, it poses a different problem altogether. In this paper, semantic similarity is explored in Bangla, a less resourced language. For ameliorating the situation in such languages, the most rudimentary method (path-based) and the latest state-of-the-art method (Word2Vec) for semantic similarity calculation were augmented using cross-lingual resources in English and the results obtained are truly astonishing. In the presented paper, two semantic similarity approaches have been explored in Bangla, namely the path-based and distributional model and their cross-lingual counterparts were synthesized in light of the English WordNet and Corpora. The proposed methods were evaluated on a dataset comprising of 162 Bangla word pairs, which were annotated by five expert raters. The correlation scores obtained between the four metrics and human evaluation scores demonstrate a marked enhancement that the cross-lingual approach brings into the process of semantic similarity calculation for Bangla.

Authors and Affiliations

Rajat Pandit, Saptarshi Sengupta, Sudip Kumar Naskar, Niladri Sekhar Dash and Mohini Mohan Sardar

Keywords

Related Articles

Assessing the Cost Impact of Multiple Transportation Modes to Enhance Sustainability in an Integrated, Two Stage, Automotive Supply Chain

As the automotive industry has been striving to enhance its efficiency, competitiveness, and sustainability, great focus is often placed on opportunities for improving its supply chain operations. We study the effect o...

Big Data Management with Incremental K-Means Trees–GPU-Accelerated Construction and Visualization

While big data is revolutionizing scientific research, the tasks of data management and analytics are becoming more challenging than ever. One way to remit the difficulty is to obtain the multilevel hierarchy embedded...

How Thumbelina Knows

In this paper, I take the book by Michel Serres, “Thumbelina”, as an occasion for reflection on the conceptual basis of knowledge management, as was built by Nonaka and co-workers. The direct access to knowledge that T...

Modeling Analytical Streams for Social Business Intelligence

Social Business Intelligence (SBI) enables companies to capture strategic information from public social networks. Contrary to traditional Business Intelligence (BI), SBI has to face the high dynamicity of both the soc...

An Adaptable System to Support Provenance Management for the Public Policy-Making Process in Smart Cities

Government policies aim to address public issues and problems and therefore play a pivotal role in people’s lives. The creation of public policies, however, is complex given the perspective of large and diverse stakeho...

Download PDF file
  • EP ID EP44180
  • DOI https://doi.org/10.3390/informatics6020019
  • Views 300
  • Downloads 0

How To Cite

Rajat Pandit, Saptarshi Sengupta, Sudip Kumar Naskar, Niladri Sekhar Dash and Mohini Mohan Sardar (2019). Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla—A Low Resourced Language. Informatics, 6(2), -. https://europub.co.uk/articles/-A-44180