Partial Greedy Algorithm to Extract a Minimum Phonetically-and-Prosodically Rich Sentence Set

Abstract

A phonetically-and-prosodically rich sentence set is so important in collecting a read-speech corpus for developing phoneme-based speech recognition. The sentence set is usually searched from a huge text corpus of million sentences using the optimization methods. One of the commonly used optimization methods for this case is a Least-to-Most Greedy (LTMG) algo-rithm. It is effective in minimizing the number of phoneme-units. Unfortunately, it does not distribute their frequencies. In this paper, a new method called Partial LTMG algorithm (PLTMG) is proposed to search an optimum set containing triphones and prosodies those are distributed in a near-uniform fashion. Testing on an Indonesian text corpus of ten million sentences crawled from some websites of newspapers and novels shows that the proposed method is not only capable of minimizing both phoneme-units and prosodies but also effective in distributing their frequencies.

Authors and Affiliations

Fahmi Alfiansyah, Suyanto Suyanto

Keywords

Related Articles

Gamification, Virality and Retention in Educational Online Platform

The paper describes gamification, virality and retention in the freemium educational online platform with 40,000 users as an example. Relationships between virality and retention parameters as measurable metrics are calc...

GIS Utilization for Delivering a Time Condition Products

As population is increasing rapidly all over the world, the need for delivering products is being more difficult especially for conditional products (products with life time). Many Customers require conditional products...

BRIQA: Framework for the Blind and Referenced Visual Image Quality Assessment

Our proposal is to present a Blind and Referenced Image Quality Assessment or BRIQA. Thus, the main proposal of this paper is to propose an Interface, which contains not only a Full-Referenced Image Quality Assessment (I...

AN ARCHITECTURAL-MODEL FOR CONTEXT AWARE ADAPTIVE DELIVERY OF LEARNING MATERIAL

The web based learning has become more complex to search required learning resources with continuously growing digital learning contents which are entangled with structural and semantic interrelationship. Meanwhile, the...

Improvement of Brain Tissue Segmentation Using Information Fusion Approach

The fusion of information is a domain of research in full effervescence these last years. Because of increasing of the diversity techniques of images acquisitions, the applications of medical images segmentation, in whic...

Download PDF file
  • EP ID EP429238
  • DOI 10.14569/IJACSA.2018.091274
  • Views 100
  • Downloads 0

How To Cite

Fahmi Alfiansyah, Suyanto Suyanto (2018). Partial Greedy Algorithm to Extract a Minimum Phonetically-and-Prosodically Rich Sentence Set. International Journal of Advanced Computer Science & Applications, 9(12), 530-534. https://europub.co.uk/articles/-A-429238