Assessing the efficacy of benchmarks for automatic speech accent recognition
Journal Title: EAI Endorsed Transactions on Creative Technologies - Year 2015, Vol 2, Issue 4
Abstract
Speech accents can possess valuable information about the speaker, and can be used in intelligent multimedia-based human-computer interfaces. The performance of algorithms for automatic classification of accents is often evaluated using audio datasets that include recording samples of different people, representing different accents. Here we describe a method that can detect bias in accent datasets, and apply the method to two accent identification datasets to reveal the existence of dataset bias, meaning that the datasets can be classified with accuracy higher than random even if the tested algorithm has no ability to analyze speech accent. We used the datasets by separating one second of silence from the beginning of each audio sample, such that the one-second sample did not contain voice, and therefore no information about the accent. An audio classification method was then applied to the datasets of silent audio samples, and provided classification accuracy significantly higher than random. These results indicate that the performance of accent classification algorithms measured using some accent classification benchmarks can be biased, and can be driven by differences in the background noise rather than the auditory features of the accents.
Authors and Affiliations
Benjamin Bock, Lior Shamir
Learning by playing: An LBG for the Fortification Gates of the Venetian walls of the city of Heraklion
Games in education have always been a tool for increasing motivation and interest of learners. We present Location-Based Games (LBG) as a tool to involve and motivate students in the learning process. LBGs require the pl...
Using Video Analysis and Machine Learning for Predicting Shot Success in Table Tennis
Coaching professional ball players has become more and more dicult and requires among other abilities also good tactical knowledge. This paper describes a program that can assist in tactical coaching for table tennis by...
PaisleyTrees: A Size-Invariant Tree Visualization
Squeezing large tree structures into suitable visualizations has been a perennial problem. In response to this challenge, we present PaisleyTrees, a size-invariant tree visualization. PaisleyTrees integrate node-of-inter...
QoE-Aware Device-to-Device Multimedia Communications
Multimedia services over mobile device-to-device (D2D) networks has recently received considerable attention. In this scenario, each device is equipped with a cellular communication interface, as well as a D2D interface...
Improvement of natural image search engines results by emotional filtering
With the Internet 2.0 era, managing user emotions is a problem that more and more actors are interested in. Historically, the first notions of emotion sharing were expressed and defined with emoticons. They allowed users...