Feature-based Similarity Method for Aligning the Malay and English News Document

Journal Title: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY - Year 2013, Vol 11, Issue 4

Abstract

Corpus-based translation approach can be used to obtain reliable translation knowledge in addition to the use of dictionaries or machine translation. But the availability of such corpus is very limited especially for the low-resources languages. Many works have been reported for the alignments of multilingual documents especially among the European languages, but less focusing on the languages with less linguistics resources. One of the challenges is to align the available multilingual documents for the creation of comparable corpus for these kinds of languages. This article describes an alignment method that utilized the statistical features of the documents such as the documents’ titles, texts of the contents, and also the named entities present in each document. This method will be focusing on the English and Malay news documents, in which in which the Malay language is considered as a low-resource language. Source and target documents were then compared in a pair. Accuracy, precision, and recall measurements were used in evaluating the results with the inclusion of three relevance scales; Same story, Shared aspect and Unrelated, to assess the alignment pairs. The results indicate that the method performed well in aligning the news documents with the accuracy of 96% and average precision of 81%.

Authors and Affiliations

Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma

Keywords

Related Articles

Application of Steganography in Symmetric Key Cryptography with Genetic Algorithm

Embedding maximum information in a stego-image with minimum change in its appearance has been a major concern in image-based steganography techniques. In this paper, utilizing Genetic algorithm (GA) we have built up a co...

Alternative Vidhi to Conversion of Cyclic CNF->GNF

In automata theory Greibach Normal Form shows that A->aV n*, where ‘a’ is terminal symbol and Vn is nonterminal symbol where * shows zero or more rates of Vn [1]. Most popular questions, conversion of following c...

Enhanced Tree Based Real Time Intrusion Detection System in Big Data

Intrusion detection is one of the major necessities of the current networked environment, where every information is available in its corresponding digital form. This paper presents an enhanced tree based approach that c...

PUBLIC KEY ENCRYPTION WITH CONJUNCTIVE FIELD FREE KEYWORD SEARCH SCHEME

Searchable encryption allows a remote server to search over encrypted documents without knowing the sensitive data contents. Prior searchable symmetric encryption schemes focus on single keyword search. Conjunctive Keywo...

CONCEPTUAL THREE PHASE KDD MODEL AND FINANCIAL RESEARCH

KDD model becomes used in financial process. Data Mining tools can be used to improve the efficiency of the professionals. The integration of Data Mining tools with the traditional financial research methods is relative...

Download PDF file
  • EP ID EP650311
  • DOI 10.24297/ijct.v11i4.3125
  • Views 80
  • Downloads 0

How To Cite

Nurul Amelina Nasharuddin, Muhamad Taufik Abdullah, Azreen Azman, Rabiah Abdul Kadir, Enrique Herrera-Viedma (2013). Feature-based Similarity Method for Aligning the Malay and English News Document. INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY, 11(4), 2410-2421. https://europub.co.uk/articles/-A-650311