Assessing the efficacy of benchmarks for automatic speech accent recognition

Journal Title: EAI Endorsed Transactions on Creative Technologies - Year 2015, Vol 2, Issue 4

Abstract

Speech accents can possess valuable information about the speaker, and can be used in intelligent multimedia-based human-computer interfaces. The performance of algorithms for automatic classification of accents is often evaluated using audio datasets that include recording samples of different people, representing different accents. Here we describe a method that can detect bias in accent datasets, and apply the method to two accent identification datasets to reveal the existence of dataset bias, meaning that the datasets can be classified with accuracy higher than random even if the tested algorithm has no ability to analyze speech accent. We used the datasets by separating one second of silence from the beginning of each audio sample, such that the one-second sample did not contain voice, and therefore no information about the accent. An audio classification method was then applied to the datasets of silent audio samples, and provided classification accuracy significantly higher than random. These results indicate that the performance of accent classification algorithms measured using some accent classification benchmarks can be biased, and can be driven by differences in the background noise rather than the auditory features of the accents.

Authors and Affiliations

Benjamin Bock, Lior Shamir

Keywords

Related Articles

QoE-Aware Device-to-Device Multimedia Communications

Multimedia services over mobile device-to-device (D2D) networks has recently received considerable attention. In this scenario, each device is equipped with a cellular communication interface, as well as a D2D interface...

Instant Evaluation of Teaching Methods and Students’ Comprehension Level using Smart Mobile Technology

We design, implement and evaluate performance of Exantas application which is compatible with Android Operating System Smartphone devices. As Exantas tool was able to show ancients travelers the correct route to follow,...

Philosophy of Computer Game with BCI as Healthcare Information Design Outcomes: Toward a New Approach of Knowledge Game

This study presents that the computer game using brain information as healthcare design outcomes is being philosophized as an object of thoughts. In order to define the philosophy of computer game with BCI (Brain-Compute...

Video Streaming Analysis in Vienna LTE System Level Simulator

The demand for multimedia services in mobile communication is increasing day by day due to the proliferation of end devices. To overcome the future needs of data communication on mobile devices, the $ 3^{rd} $ Generation...

Learnings from an Iterative Design Process for Technology-Mediated Audience Participation (TMAP) using Smartphones

We discuss a setup for technology-mediated audience participation (TMAP)in live music using smartphones and high-frequency sound IDs in a playful setting. The audience needs to install a smartphone app. Using high-freque...

Download PDF file
  • EP ID EP45835
  • DOI http://dx.doi.org/10.4108/icst.mobimedia.2015.259033
  • Views 268
  • Downloads 0

How To Cite

Benjamin Bock, Lior Shamir (2015). Assessing the efficacy of benchmarks for automatic speech accent recognition. EAI Endorsed Transactions on Creative Technologies, 2(4), -. https://europub.co.uk/articles/-A-45835