Partial Greedy Algorithm to Extract a Minimum Phonetically-and-Prosodically Rich Sentence Set

Abstract

A phonetically-and-prosodically rich sentence set is so important in collecting a read-speech corpus for developing phoneme-based speech recognition. The sentence set is usually searched from a huge text corpus of million sentences using the optimization methods. One of the commonly used optimization methods for this case is a Least-to-Most Greedy (LTMG) algo-rithm. It is effective in minimizing the number of phoneme-units. Unfortunately, it does not distribute their frequencies. In this paper, a new method called Partial LTMG algorithm (PLTMG) is proposed to search an optimum set containing triphones and prosodies those are distributed in a near-uniform fashion. Testing on an Indonesian text corpus of ten million sentences crawled from some websites of newspapers and novels shows that the proposed method is not only capable of minimizing both phoneme-units and prosodies but also effective in distributing their frequencies.

Authors and Affiliations

Fahmi Alfiansyah, Suyanto Suyanto

Keywords

Related Articles

A Novel Architecture for Network Coded Electronic Health Record Storage System

The use of network coding for large scale content distribution improves download time. This is demonstrated in this work by the use of network coded Electronic Health Record Storage System (EHR-SS). A Novel Architecture...

Secure and Efficient Routing Mechanism in Mobile Ad-Hoc Networks

Securing crucial information is always considered as one of the complex, critical, and a time-consuming task. This research investigates a significant threat to the security of a network, i.e., selective forwarding attac...

Wavelet Based Image Denoising Technique

This paper proposes different approaches of wavelet based image denoising methods. The search for efficient image denoising methods is still a valid challenge at the crossing of functional analysis and statistics. In spi...

GEO-VISUAL APPROACH FOR SPATIAL SCAN STATISTICS: AN ANALYSIS OF DENGUE FEVER OUTBREAKS IN DELHI

There are very few surveillance systems being used to detect disease outbreaks at present. In disease surveillance system, data related to cases and various risk factors are collected and then the collected data is trans...

Root-Cause and Defect Analysis based on a Fuzzy Data Mining Algorithm

Manufacturing organizations have to improve the quality of their products regularly to survive in today’s competitive production environment. This paper presents a method for identification of unknown patterns between th...

Download PDF file
  • EP ID EP429238
  • DOI 10.14569/IJACSA.2018.091274
  • Views 97
  • Downloads 0

How To Cite

Fahmi Alfiansyah, Suyanto Suyanto (2018). Partial Greedy Algorithm to Extract a Minimum Phonetically-and-Prosodically Rich Sentence Set. International Journal of Advanced Computer Science & Applications, 9(12), 530-534. https://europub.co.uk/articles/-A-429238