This repository contains the source code for the thesis titled "A 3D CNN MODEL WiTH MULTi‑FEATURE FUSiON FOR ENHANCiNG HUMAN EMOTiON RECOGNiTiON FROM SPEECH". The thesis was presented and published at the 6th International Conference on Communication and Computational Technologies (ICCCT 2024) at Rajasthan Institute of Engineering and Technology, Jaipur, India.
Human emotion recognition from speech has significant applications in various fields such as human-computer interaction, mental health monitoring, and customer service. This project presents a novel 3D Convolutional Neural Network (CNN) model incorporating multi-feature fusion to enhance the accuracy of emotion recognition from microphone-captured speech data.
- Developed a 3D CNN Classifier to identify emotions from speech data.
- Utilizes MFCC, chroma shift, and mel-spectrogram features, forming a 3D tensor for detailed emotional cue analysis.
- Demonstrated superior accuracy compared to state-of-the-art models.
To run this project, you will need to have Python and the necessary libraries installed. You can install the required libraries using:
pip install -r requirements.txtThe dataset used for training and evaluation is available on Kaggle. You can access it from this link.
Our 3D CNN model introduces a novel approach by incorporating three distinct feature extraction techniques—MFCC, chroma shift, and mel-spectrogram—stacked along the Z-axis, forming the third dimension. This approach resulted in improved accuracy for human emotion recognition from speech.
For more details and to access the notebook, visit the Kaggle notebook.
This work was published at:
- 6th International Conference on Communication and Computational Technologies (ICCCT 2024)
- Rajasthan Institute of Engineering and Technology, Jaipur, India
We would like to thank the organizers of ICCCT 2024 and the Rajasthan Institute of Engineering and Technology for their support.