A Novel Approach for English to South Dravidian Language Statistical Machine Translation System

Journal Title: International Journal on Computer Science and Engineering - Year 2010, Vol 2, Issue 8

Abstract

Development of a well fledged bilingual machine translation (MT) system for any two natural languages with limited lectronic resources and tools is a challenging and demanding task. This paper presents the development of a statistical achine translation (SMT) system for English to South Dravidian languages like Malayalam and Kannada by incorporating syntactic and morphological information. SMT is a data oriented statistical framework for translating text from one natural language to another based on the knowledge extracted from bilingual corpus. Even though there are efforts towards building such an English to South Dravidian translation system ,unfortunately we do not have an efficient translation system till now. The first and most important step in SMT is creating a well aligned parallel corpus for training the system. Experimental research shows that the existing methodology for bilingual parallel corpus creation is not efficient for English to South Dravidian language in the SMT system. In order to increase the performance of the translation system, we have introduced a new approach in creating parallel corpus. The main ideas which we have implemented and proven very effective for English to south Dravidian languages SMT system are: (i) reordering the English source sentence according to Dravidian syntax, (ii) using the root suffix separation on both English and Dravidian words and iii) use of morphological information which substantially reduce the corpus size required for training the system. Since the unavailability of full fledged parsing and morphological tools for Malayalam and Kannada languages, sentence synthesis was done both anually and existing morph analyzer created by Amrita university. From the experiment we found that the performance of our systems are significantly well and achieves a very ompetitive accuracy for small sized bilingual corpora. The proposed ideas can be directly used for other south Dravidian languages like Tamil and Telugu with some minor changes.

Authors and Affiliations

Unnikrishnan P , Antony P J , Dr. Soman K P

Keywords

Related Articles

An Analysis of Checkpointing Algorithms for Distributed Mobile Systems

Distributed snapshots are an important building block for istributed systems, and are useful for constructing efficient checkpointing protocols, among other uses. Direct application of these algorithms to mobile systems...

Search Engine Optimization through Spanning Forest Generation Algorithm

Search engine technology has had to scale dramatically to keep up with the growth of the web. With the tremendous growth of information available to end users through the Web, search engines come to play ever a more crit...

Image Mining Using Texture and Shape Feature

Discovering knowledge from data stored in typical alphanumeric databases, such as relational databases, has been the focal point of most of the work in database mining. However, with advances in secondary and tertiary st...

Cryptosystem for Information Security

This paper introduces a symmetric cryptosystem for information. Algorithms are described for implementing the proposed method. Cryptanalysis of the proposed scheme is reported along with similar analysis for two popular...

Enhanced Bee Colony Algorithm for Complex Optimization Problems

Optimization problems are considered to be one kind of NP hard problems. Usually heuristic approaches are found to provide solutions for NP hard problems. There are a plenty of heuristic algorithms available to solve opt...

Download PDF file
  • EP ID EP108073
  • DOI -
  • Views 104
  • Downloads 0

How To Cite

Unnikrishnan P, Antony P J, Dr. Soman K P (2010). A Novel Approach for English to South Dravidian Language Statistical Machine Translation System. International Journal on Computer Science and Engineering, 2(8), 2749-2759. https://europub.co.uk/articles/-A-108073