Data Categorization and Model Weighting Approach for Language Model Adaptation in Statistical Machine Translation
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 1
Abstract
Language model encapsulates semantic, syntactic and pragmatic information about specific task. Intelligent systems especially natural language processing systems can show different results in terms of performance and precision when moving among genres and domains. Therefore researchers have explored different language model adaptation strategies in order to overcome effectiveness issue. There are two main categories in language model adaptation techniques. The first category includes the techniques that based on the data selection where task-oriented corpus can be extracted and used to train and generate models for specific translations. While the second category focuses on developing a weighting criterion to assign the test data to specific model corpus. The purpose of this research is to introduce language model adaptation approach that combines both categories (data selection and weighting criterion) of language model adaptation. This approach applies data selection for specific-task translations by dividing the corpus into smaller and topic-related corpora using clustering process. We investigate the effect of different approaches for clustering the bilingual data on the language model adaptation process in terms of translation quality using the Europarl corpus WMT07 that includes bilingual data for English-Spanish, English-German and English-French. A mixture of language models should assign any given data to the right language model to be used in the translation process using a specific weighting criterion. The proposed language model adaptation has achieved better translation quality compare to the baseline model in Statistical Machine Translation (SMT).
Authors and Affiliations
Mohammed AbuHamad, Masnizah Mohd
An Enhanced Partial Transmit Sequence Segmentation Schemes to Reduce the PAPR in OFDM Systems
Although the orthogonal frequency division multiplexing system (OFDM) is widely used in high-speed data rate wire and wireless environment, the peak-to- average-power-ratio (PAPR) is one of its major obstacles for the re...
A New Application Programming Interface and a Fortran-like Modeling Language for Evaluating Functions and Specifying Optimization Problems at Runtime
A new application programming interface for evaluating functions and specifying optimization problems at runtime has been developed. The new interface, named FEFAR, uses a simple language named LEFAR. Compared with...
An Analytical Model for Availability Evaluation of Cloud Service Provisioning System
Cloud computing is a major technological trend that continues to evolve and flourish. With the advent of the cloud, high availability assurance of cloud service has become a critical issue for cloud service providers and...
Agent-based Managing for Grid Cloud System — Design and Prototypal Implementation
Here, we present the design and architecture of an Agent-based Manager for Grid Cloud Systems (AMGCS) using software agents to ensure independency and scalability when the number of resources and jobs increase. AMGCS han...
Stable Haptic Rendering For Physics Engines Using Inter-Process Communication and Remote Virtual Coupling
Availability of physics engines has significantly reduced the effort required to develop interactive applications concerning the simulation of physical world. However, it becomes a problem when kinesthetic feedback is ne...