Developing an Adaptive Language Model for Bahasa Indonesia

Abstract

A language model is one of the important compo-nents in a speech recognition system. It is commonly developed using a statistical method called n-gram. However, a standard n-gram cannot be used for general domains with so many am-biguous semantics of sentences. This paper focuses on developing an adaptive n-gram language model for Bahasa Indonesia. First, a text corpus of ten million distinct sentences is crawled from hundreds of websites of news, magazines, personal blogs, and writing forums. The text corpus is then used to construct an adaptive language model using Latent Dirichlet Allocation (LDA) with Collapsed Gibbs Sampling (CGS) training method. Compare to the standard n-gram, the adaptive language model gives a better performance in the word selection to produce the best sentence.

Authors and Affiliations

Satria Nur Hidayatullah, Suyanto Suyanto

Keywords

Related Articles

Simplex Parallelization in a Fully Hybrid Hardware Platform

The simplex method has been successfully used in solving linear programming (LP) problems for many years. Parallel approaches have also extensively been studied due to the intensive computations required, especially for...

Prototype of a Web ETL Tool

Extract, transform and load (ETL) is a process that makes it possible to extract data from operational data sources, to transform data in the way needed for data warehousing purposes and to load data into a data warehous...

RSECM: Robust Search Engine using Context-based Mining for Educational Big Data

With an accelerating growth in the educational sector along with the aid of ICT and cloud-based services, there is a consistent rise of educational big data, where storage and processing become the prime matter of challe...

A Trust and Reputation Model for Quality Assessment of Online Content

In recent years, online transactions have become more prevalent than it was. This means that the number of online users to perform such transactions keeps growing, causing an increase in the level of expectations for the...

A Global Convergence Algorithm for the Supply Chain Network Equilibrium Model

In this paper, we first present an auxiliary problem method for solving the generalized variational inequalities problem on the supply chain network equilibrium model (GVIP), then its global convergence is also establish...

Download PDF file
  • EP ID EP448920
  • DOI 10.14569/IJACSA.2019.0100163
  • Views 99
  • Downloads 0

How To Cite

Satria Nur Hidayatullah, Suyanto Suyanto (2019). Developing an Adaptive Language Model for Bahasa Indonesia. International Journal of Advanced Computer Science & Applications, 10(1), 488-492. https://europub.co.uk/articles/-A-448920