Feature-based Similarity Method for Aligning the Malay and English News Document
Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2013, Vol 11, Issue 4
Abstract
Corpus-based translation approach can be used to obtain reliable translation knowledge in addition to the use of dictionaries or machine translation. But the availability of such corpus is very limited especially for the low-resources languages. Many works have been reported for the alignments of multilingual documents especially among the European languages, but less focusing on the languages with less linguistics resources. One of the challenges is to align the available multilingual documents for the creation of comparable corpus for these kinds of languages. This article describes an alignment method that utilized the statistical features of the documents such as the documents’ titles, texts of the contents, and also the named entities present in each document. This method will be focusing on the English and Malay news documents, in which in which the Malay language is considered as a low-resource language. Source and target documents were then compared in a pair. Accuracy, precision, and recall measurements were used in evaluating the results with the inclusion of three relevance scales; Same story, Shared aspect and Unrelated, to assess the alignment pairs. The results indicate that the method performed well in aligning the news documents with the accuracy of 96% and average precision of 81%.
Authors and Affiliations
Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma
Binary Quantum Communication using Squeezed Light: Theoretical and Experimental Frame Work
The aim of this paper is to develop framework to generate squeezed light for binary quantum communication. Both theoretical and experimental models to generate squeezed state  using optical parametric amplifier (OPO), w...
Efficient Detection of SPAM messages and SPAM zombies in the Internet using Naïve-Bayesian and Sequential Probability Ratio Test (SPRT)
The Internet is a global system of interconnected computer networks that provides the communication to serve billions of users worldwide. Compromised machines in the internet allows the attackers to launch various securi...
ENHANCED CREDIT BASED LOAD BALANCING IN CLOUD ENVIRONMENT
Cloud computing is one of the latest and upcoming paradigm that offers huge benefits such as reduced time to market, unlimited computing power and flexible computing capabilities. It is a model that provides an on-demand...
Spline Computation for Solving Magnetohidrodynamics Free Convection Flow
In this paper, we construct numerical algorithms for solving Magnetohidrodynamics(MHD) free convection flow rate whichhas been discussed in detail. It is observed that, for a nonlinear system of differential equation, th...
Rule Based Fuzzy Indexing for Grading of proposed Industrial Sites for Power Plant Installation
The problem of site selection in an indefinite environment has gained overriding importance in recent years. In case of selection of a site for the construction of new hazardous power plants have attained significance du...