Alex Net-Based Speech Emotion Recognition Using 3D Mel-Spectrograms
Journal Title: International Journal of Innovations in Science and Technology - Year 2024, Vol 6, Issue 2
Abstract
Speech Emotion Recognition (SER) is considered a challenging task in the domain of Human-Computer Interaction (HCI) due to the complex nature of audio signals. To overcome this challenge, we devised a novel method to fine-tune Convolutional Neural Networks (CNNs) for accurate recognition of speech emotion. This research utilized the spectrogram representation of audio signals as input to train a modified Alex Net model capable of processing signals of varying lengths. The IEMOCAP dataset was utilized to identify multiple emotional states such as happy, sad, angry, and neutral from the speech. The audio signal was preprocessed to extract a 3D spectrogram that represents time, frequencies, and color amplitudes as key features. The output of the modified Alex Net model is a 256-dimensional vector. The model achieved adequate accuracy, highlighting the effectiveness of CNNs and 3D Mel-Spectrograms in achieving precise and efficient speech emotion recognition, thus paving the way for significant advancements in this domain.
Authors and Affiliations
Sara Ali, BushraNaz,Sanam Narejo, Zohaib Ahmed
Unlocking Potential: Personality-Aware TVET Course Recommendations Revolutionize Skill Development
Personality is a complex amalgamation of ideas, behaviors, and social constructs that shape our self-perception and influence our interactions with others. It tends to remain relatively stable over time. The developmen...
Breaking Down Monoliths: A Graph Based Approach to Microservices Migration
Introduction: The software industry has increasingly transitioned from Monolithic Architecture (MA) to Microservices Architecture (MSA) due to the significant advantages offered by MSA. A crucial first step in this mig...
Analyzing Privacy in Frank Lloyd Wright's Prairie Style Homes Through Syntactic Methodsusing “A Graph”and Depth Map XSoftwares
Frank Lloyd Wright's Prairie Style homes, designed across the United States, showcase his unique architectural approach. This study examines how Wright's designs interact with environmental conditions, focusing on priv...
Management of Speech Impairment Disorders in Aphasia Patients using Digital Intervention with Multilingual Regional Dialects
Speech isa zestful, and intricate activity that enables people to express ideas, emotions, and thoughts. We are able to render our views because of this neural activity. It is a significant process for learning and perso...
Complex Human Activities Recognition Using Smartphone Sensors: A Deep Learning Approach
Human Activity Recognition (HAR) plays a critical role in understanding human behavior, with mobile phone sensors offering a promising approach for practical applications. This research uniquely addresses the challenge...