Partial Greedy Algorithm to Extract a Minimum Phonetically-and-Prosodically Rich Sentence Set

Abstract

A phonetically-and-prosodically rich sentence set is so important in collecting a read-speech corpus for developing phoneme-based speech recognition. The sentence set is usually searched from a huge text corpus of million sentences using the optimization methods. One of the commonly used optimization methods for this case is a Least-to-Most Greedy (LTMG) algo-rithm. It is effective in minimizing the number of phoneme-units. Unfortunately, it does not distribute their frequencies. In this paper, a new method called Partial LTMG algorithm (PLTMG) is proposed to search an optimum set containing triphones and prosodies those are distributed in a near-uniform fashion. Testing on an Indonesian text corpus of ten million sentences crawled from some websites of newspapers and novels shows that the proposed method is not only capable of minimizing both phoneme-units and prosodies but also effective in distributing their frequencies.

Authors and Affiliations

Fahmi Alfiansyah, Suyanto Suyanto

Keywords

Related Articles

Efficient Algorithm for Maximal Clique Size Evaluation

A large dataset network is considered for computation of maximal clique size (MC). Additionally, its link with popular centrality metrics to decrease uncertainty and complexity and for finding influential points of any n...

Integrated Information System for reserving rooms in Hotels 

It is very important to build new and modern flexible dynamic effective compatible reusable information systems including database to help manipulate different processes and deal with many parts around it. One of these i...

A Hybrid Genetic Algorithm with Tabu Search for Optimization of the Traveling Thief Problem

Until now, several approaches such as evolutionary computing and heuristic methods have been presented to optimize the traveling thief problem (TTP). However, most of these approaches consider the TTP components independ...

A Rich Feature-based Kernel Approach for Drug- Drug Interaction Extraction

Discovering drug-drug interactions (DDIs) is a crucial issue for both patient safety and health care cost control. Developing text mining techniques for identifying DDIs has attracted a great deal of attention in the las...

Low Power and High Reliable Triple Modular Redundancy Latch for Single and Multi-node Upset Mitigation

CMOS based circuits are more susceptible to the radiation environment as the critical charge (Qcrit) decreases with technology scaling. A single ionizing radiation particle is more likely to upset the sensitive nodes of...

Download PDF file
  • EP ID EP429238
  • DOI 10.14569/IJACSA.2018.091274
  • Views 89
  • Downloads 0

How To Cite

Fahmi Alfiansyah, Suyanto Suyanto (2018). Partial Greedy Algorithm to Extract a Minimum Phonetically-and-Prosodically Rich Sentence Set. International Journal of Advanced Computer Science & Applications, 9(12), 530-534. https://europub.co.uk/articles/-A-429238