Assessing the efficacy of benchmarks for automatic speech accent recognition
Journal Title: EAI Endorsed Transactions on Creative Technologies - Year 2015, Vol 2, Issue 4
Abstract
Speech accents can possess valuable information about the speaker, and can be used in intelligent multimedia-based human-computer interfaces. The performance of algorithms for automatic classification of accents is often evaluated using audio datasets that include recording samples of different people, representing different accents. Here we describe a method that can detect bias in accent datasets, and apply the method to two accent identification datasets to reveal the existence of dataset bias, meaning that the datasets can be classified with accuracy higher than random even if the tested algorithm has no ability to analyze speech accent. We used the datasets by separating one second of silence from the beginning of each audio sample, such that the one-second sample did not contain voice, and therefore no information about the accent. An audio classification method was then applied to the datasets of silent audio samples, and provided classification accuracy significantly higher than random. These results indicate that the performance of accent classification algorithms measured using some accent classification benchmarks can be biased, and can be driven by differences in the background noise rather than the auditory features of the accents.
Authors and Affiliations
Benjamin Bock, Lior Shamir
Evaluating music performance and context-sensitivity with Immersive Virtual Environments
This study explores a unique experimental protocol that evaluates how a musician’s sensitivity to social context during performance can be analysed through a combination of behavioral analysis, self-report and Immersive...
Towards a 15.5W Si-LDMOS Energy Efficient Balanced RF Power Amplifier for 5G-LTE Multi-carrier Applications
In this paper, a 15.5W Si-LDMOS balanced RF power amplifier has been designed using 2.620-2.690GHz frequency band to improve efficiency and linearity for 5G-LTE mobile applications. The amplifier was designed and simulat...
Assessing the efficacy of benchmarks for automatic speech accent recognition
Speech accents can possess valuable information about the speaker, and can be used in intelligent multimedia-based human-computer interfaces. The performance of algorithms for automatic classification of accents is often...
Establishing Interaction between Machine and Medaka using Deep Q-Network
Social interaction is the basic ability for animals to survive. It is difficult for a machine to interact with human or other animals because it is not clear how the machine should interact. This paper examines whether a...
Moving Collaborations: A Critical Inquiry Into Designing Creative Interactive Systems for Choreography
The use of technology in choreographic process has been encumbered by the richness of data in live human movement and the constraints of computation. While technology is often considered a tool in choreographic process,...