Towards End-to-End Speech Recognition System for Pashto Language Using Transformer Model
Journal Title: International Journal of Innovations in Science and Technology - Year 2024, Vol 6, Issue 1
Abstract
The conventional use of Hidden Markov Models (HMMs), and Gaussian Mixture Models (GMMs) for speech recognition posed setup challenges and inefficiency. This paper adopts the Transformer model for Pashto continuous speech recognition, offering an End-to-End (E2E) system that directly represents acoustic signals in the label sequence, simplifying implementation. This study introduces a Transformer model leveraging its state-of-the-art capabilities, including parallelization and self-attention mechanisms. With limited data for Pashto, the Transformer is chosen for its proficiency in handling constraints. The objective is to develop an accurate Pashto speech recognition system. Through 200 hours of conversational data, the study achieves a Word Error Rate (WER) of up to 51% and a Character Error Rate (CER) of up to 29%. The model's parameters are fine-tuned, and the dataset size increased, leading to significant improvements. Results demonstrate the Transformer's effectiveness, showcasing its prowess in limited data scenarios. The study attains notable WER and CER metrics, affirming the model's ability to recognize Pashto speech accurately. In conclusion, the study establishes the Transformer as a robust choice for Pashto speech recognition, emphasizing its adaptability to limited data conditions. It fills a gap in ASR research for the Pashto language, contributing to the advancement of speech recognition technology in under-resourced languages. The study highlights the potential for further improvement with increased training data. The findings underscore the importance of fine-tuning and dataset augmentation in enhancing model performance and reducing error rates
Authors and Affiliations
Munaza Sher, Nasir Ahmad, Madiha Sher
Leveraging Cryptographic Primitives of Blockchain for Trust in Smart Systems
Calculating and maintaining trust using Hyperledger Fabric in smart systems plays a vital role in mitigating various trust-related attacks. Current smart systems encounter several challenges, including dependence on ce...
A Deep Learning Approach toSemantic Clarity in UrduTranslationsof the Holy Quran
The Holy Quran holds profound significance from both religious and linguistic perspectives yet its Urdu translations face difficulties in preserving the original meaning because of ambiguous words wh...
A Computational Studyof Ichthyofaunal Diversity of River Kabul
Mcclelland initiated the scientific study of the fish species of the River Kabul in 1842, and many researchers have continued this work since then. The primary goal of these studies has been to do a computational study...
Enhancing Three-Phase Induction Motor Performance with Soft Ramp Control
hree-phase induction motors experience high inrush currents during start-up, exceeding their rated capacity and potentially damaging stator windings. This paper explores the implementation of soft ramp control to addre...
Towards End-to-End Speech Recognition System for Pashto Language Using Transformer Model
The conventional use of Hidden Markov Models (HMMs), and Gaussian Mixture Models (GMMs) for speech recognition posed setup challenges and inefficiency. This paper adopts the Transformer model for Pashto continuous sp...