Feature-based Similarity Method for Aligning the Malay and English News Document

Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2013, Vol 11, Issue 4

Abstract

Corpus-based translation approach can be used to obtain reliable translation knowledge in addition to the use of dictionaries or machine translation. But the availability of such corpus is very limited especially for the low-resources languages. Many works have been reported for the alignments of multilingual documents especially among the European languages, but less focusing on the languages with less linguistics resources. One of the challenges is to align the available multilingual documents for the creation of comparable corpus for these kinds of languages. This article describes an alignment method that utilized the statistical features of the documents such as the documents’ titles, texts of the contents, and also the named entities present in each document. This method will be focusing on the English and Malay news documents, in which in which the Malay language is considered as a low-resource language. Source and target documents were then compared in a pair. Accuracy, precision, and recall measurements were used in evaluating the results with the inclusion of three relevance scales; Same story, Shared aspect and Unrelated, to assess the alignment pairs. The results indicate that the method performed well in aligning the news documents with the accuracy of 96% and average precision of 81%.

Authors and Affiliations

Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma

Keywords

Related Articles

Binary Quantum Communication using Squeezed Light: Theoretical and Experimental Frame Work

The aim of this paper is to develop framework to generate squeezed light for binary quantum communication. Both theoretical and experimental models to generate squeezed state  using optical parametric amplifier (OPO), w...

Efficient Detection of SPAM messages and SPAM zombies in the Internet using Naïve-Bayesian and Sequential Probability Ratio Test (SPRT)

The Internet is a global system of interconnected computer networks that provides the communication to serve billions of users worldwide. Compromised machines in the internet allows the attackers to launch various securi...

ENHANCED CREDIT BASED LOAD BALANCING IN CLOUD ENVIRONMENT

Cloud computing is one of the latest and upcoming paradigm that offers huge benefits such as reduced time to market, unlimited computing power and flexible computing capabilities. It is a model that provides an on-demand...

Spline Computation for Solving Magnetohidrodynamics Free Convection Flow

In this paper, we construct numerical algorithms for solving Magnetohidrodynamics(MHD) free convection flow rate whichhas been discussed in detail. It is observed that, for a nonlinear system of differential equation, th...

Rule Based Fuzzy Indexing for Grading of proposed Industrial Sites for Power Plant Installation

The problem of site selection in an indefinite environment has gained overriding importance in recent years. In case of selection of a site for the construction of new hazardous power plants have attained significance du...

Download PDF file
  • EP ID EP650311
  • DOI 10.24297/ijct.v11i4.3125
  • Views 97
  • Downloads 0

How To Cite

Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma (2013). Feature-based Similarity Method for Aligning the Malay and English News Document. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 11(4), 2410-2421. https://europub.co.uk/articles/-A-650311