Alex Net-Based Speech Emotion Recognition Using 3D Mel-Spectrograms

Abstract

Speech Emotion Recognition (SER) is considered a challenging task in the domain of Human-Computer Interaction (HCI) due to the complex nature of audio signals. To overcome this challenge, we devised a novel method to fine-tune Convolutional Neural Networks (CNNs) for accurate recognition of speech emotion. This research utilized the spectrogram representation of audio signals as input to train a modified Alex Net model capable of processing signals of varying lengths. The IEMOCAP dataset was utilized to identify multiple emotional states such as happy, sad, angry, and neutral from the speech. The audio signal was preprocessed to extract a 3D spectrogram that represents time, frequencies, and color amplitudes as key features. The output of the modified Alex Net model is a 256-dimensional vector. The model achieved adequate accuracy, highlighting the effectiveness of CNNs and 3D Mel-Spectrograms in achieving precise and efficient speech emotion recognition, thus paving the way for significant advancements in this domain.

Authors and Affiliations

Sara Ali, BushraNaz,Sanam Narejo, Zohaib Ahmed

Keywords

Related Articles

Unlocking Potential: Personality-Aware TVET Course Recommendations Revolutionize Skill Development

Personality is a complex amalgamation of ideas, behaviors, and social constructs that shape our self-perception and influence our interactions with others. It tends to remain relatively stable over time. The developmen...

Breaking Down Monoliths: A Graph Based Approach to Microservices Migration

Introduction: The software industry has increasingly transitioned from Monolithic Architecture (MA) to Microservices Architecture (MSA) due to the significant advantages offered by MSA. A crucial first step in this mig...

Analyzing Privacy in Frank Lloyd Wright's Prairie Style Homes Through Syntactic Methodsusing “A Graph”and Depth Map XSoftwares

Frank Lloyd Wright's Prairie Style homes, designed across the United States, showcase his unique architectural approach. This study examines how Wright's designs interact with environmental conditions, focusing on priv...

Management of Speech Impairment Disorders in Aphasia Patients using Digital Intervention with Multilingual Regional Dialects

Speech isa zestful, and intricate activity that enables people to express ideas, emotions, and thoughts. We are able to render our views because of this neural activity. It is a significant process for learning and perso...

Complex Human Activities Recognition Using Smartphone Sensors: A Deep Learning Approach

Human Activity Recognition (HAR) plays a critical role in understanding human behavior, with mobile phone sensors offering a promising approach for practical applications. This research uniquely addresses the challenge...

Download PDF file
  • EP ID EP760319
  • DOI -
  • Views 38
  • Downloads 0

How To Cite

Sara Ali, BushraNaz, Sanam Narejo, Zohaib Ahmed (2024). Alex Net-Based Speech Emotion Recognition Using 3D Mel-Spectrograms. International Journal of Innovations in Science and Technology, 6(2), -. https://europub.co.uk/articles/-A-760319