Feature-based Similarity Method for Aligning the Malay and English News Document
Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2013, Vol 11, Issue 4
Abstract
Corpus-based translation approach can be used to obtain reliable translation knowledge in addition to the use of dictionaries or machine translation. But the availability of such corpus is very limited especially for the low-resources languages. Many works have been reported for the alignments of multilingual documents especially among the European languages, but less focusing on the languages with less linguistics resources. One of the challenges is to align the available multilingual documents for the creation of comparable corpus for these kinds of languages. This article describes an alignment method that utilized the statistical features of the documents such as the documents’ titles, texts of the contents, and also the named entities present in each document. This method will be focusing on the English and Malay news documents, in which in which the Malay language is considered as a low-resource language. Source and target documents were then compared in a pair. Accuracy, precision, and recall measurements were used in evaluating the results with the inclusion of three relevance scales; Same story, Shared aspect and Unrelated, to assess the alignment pairs. The results indicate that the method performed well in aligning the news documents with the accuracy of 96% and average precision of 81%.
Authors and Affiliations
Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma
REGION BASED CLUSTERING FOR DATA COLLECTION IN WSN
The lower cost and easier installation of the WSNs than the wired counterpart pushes industry and academia to pay more attention to this promising technology. Large scale networks of small energy-constrained sensor nodes...
Information Security Awareness Behavior : A Conceptual Model for Cloud
Cloud computing has changed the whole picture that distributed computing used to present such as Grid computing, server client computing. Despite Cloud offers great benefits, it also introduces a myriad of security thr...
Developing a Genetic Fuzzy System Model for Cost-Benefit Analysis.
Cost benefit analysis is a systematic approach for calculation and analyzing the cost of a project. Soft computing approaches are also applicable to deal with cost benefit analysis. In this paper Mamdani fuzzy system has...
Skin Color detection Using Stepwise Neural Network and Color Mapping Co-occurrence Matrix
Skin color has been proven to be a useful and robust cue for face detection, human tracking, image content filtering, pornographic filtering, etc. Most of skin classification researches are focused on using pixel-based...
A Comparative Analysis of Feed-Forward and Generalized Regression Neural Networks for Face Recognition Using Principal Component Analysis
In this paper we give a comparative analysis of performance of feed forward neural network and generalized regression neural network based face recognition. We use different inner epoch for different input pattern accor...