A Novel Approach for English to South Dravidian Language Statistical Machine Translation System
Journal Title: International Journal on Computer Science and Engineering - Year 2010, Vol 2, Issue 8
Abstract
Development of a well fledged bilingual machine translation (MT) system for any two natural languages with limited lectronic resources and tools is a challenging and demanding task. This paper presents the development of a statistical achine translation (SMT) system for English to South Dravidian languages like Malayalam and Kannada by incorporating syntactic and morphological information. SMT is a data oriented statistical framework for translating text from one natural language to another based on the knowledge extracted from bilingual corpus. Even though there are efforts towards building such an English to South Dravidian translation system ,unfortunately we do not have an efficient translation system till now. The first and most important step in SMT is creating a well aligned parallel corpus for training the system. Experimental research shows that the existing methodology for bilingual parallel corpus creation is not efficient for English to South Dravidian language in the SMT system. In order to increase the performance of the translation system, we have introduced a new approach in creating parallel corpus. The main ideas which we have implemented and proven very effective for English to south Dravidian languages SMT system are: (i) reordering the English source sentence according to Dravidian syntax, (ii) using the root suffix separation on both English and Dravidian words and iii) use of morphological information which substantially reduce the corpus size required for training the system. Since the unavailability of full fledged parsing and morphological tools for Malayalam and Kannada languages, sentence synthesis was done both anually and existing morph analyzer created by Amrita university. From the experiment we found that the performance of our systems are significantly well and achieves a very ompetitive accuracy for small sized bilingual corpora. The proposed ideas can be directly used for other south Dravidian languages like Tamil and Telugu with some minor changes.
Authors and Affiliations
Unnikrishnan P , Antony P J , Dr. Soman K P
Controller Design Based on ISE Minimization and Dominant Pole Retention Method
A computer based method to reduce the complexity of the higher order controller, based on the minimization of integral square error (ISE) and Dominant Pole Retention method pertaining to unit step input is presented in...
Data Mining Application to Attract Students in HEI
In the last two decades, number of Higher Education Institutions (HEI) grows in leaps and bounds. This causes a cut throat competition among these institutions while attracting the student get admission in these institut...
Mobile Wireless Enhanced Routing Protocol in Adhoc Networks
Adhoc network consists of peer-to-peer communicating nodes that are highly mobile. As such, an ad-hoc network lacks nfrastructure and the topology of the network changes ynamically. The task of routing data from a source...
Analyzing security of Authenticated Routing Protocol (ARAN)
Ad hoc network allow nodes to communicate beyond their irect wireless transmission range by introducing cooperation in mobile computer (nodes). Many proposed routing protocol or ad hoc network operate in an ad hoc fashio...
OUTDOOR PROPAGATION MODELS A LITERATURE REVIEW
The major focus of this review is based on earlier & present day developments encompassing the field of radio transmission & propagation. It covers a wide area of radio communication in a more subtle & elasti...