This project is a web application that recognizes human emotions from speech. It uses a deep learning model (CNN) trained on the RAVDESS dataset to predict emotions from .wav audio files or live microphone input. The entire application is built with Python, using TensorFlow/Keras for modeling, Librosa for audio processing, and Streamlit for the interactive user interface.
- Live Emotion Prediction: Analyzes audio directly from a microphone.
- File-Based Analysis: Predicts emotions from user-uploaded
.wavaudio files. - Interactive UI: A simple, clean, and user-friendly web interface built with Streamlit.
- Deep Learning Model: A Convolutional Neural Network (CNN) built with TensorFlow and Keras for emotion classification.
Follow these steps to set up and run the project on your local machine.
1. Clone the Repository
git clone [https://github.com/your-username/your-repo-name.git](https://github.com/your-username/your-repo-name.git)
cd your-repo-name
2. Create and Activate a Virtual Environment
This project requires a virtual environment to manage dependencies correctly.
macOS/Linux:
Bash
python3 -m venv venv
source venv/bin/activate
Windows:
Bash
python -m venv venv
.\venv\Scripts\activate
3. Install Dependencies
This project requires FFmpeg for audio processing.
Install FFmpeg:
On macOS (using Homebrew): brew install ffmpeg
On Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
On Windows: Download the executable from the official FFmpeg website and add it to your system's PATH.
Install Python Libraries:
Install all required Python packages using the requirements.txt file.
Bash
pip install -r requirements.txt
4. Run the Streamlit Application
Once the setup is complete, you can run the application.
Bash
streamlit run app.py
The application will open in a new tab in your web browser.
📊 Model Performance and Limitations
The model achieves a test accuracy of approximately 40% on the 8-class RAVDESS dataset. While significantly better than random guessing (12.5%), there are known limitations:
Class Imbalance: A key issue is the model's performance on the 'disgust' emotion. This is a direct result of class imbalance in the RAVDESS dataset, where the 'disgust' category has less than half the samples of most other emotions. This leads to the model being less confident and accurate in predicting this specific emotion.
Language Dependency: The model is trained exclusively on North American English speakers and will not perform accurately on other languages.
🧹 Cleanup and Privacy Note
Privacy: Please note that any absolute file paths used during the initial data exploration phase have been removed from the scripts for user privacy and to ensure the code is system-agnostic.
Virtual Environment (venv): The venv folder, which contains the project-specific Python environment and libraries, has been intentionally removed from this repository to reduce the project size and ensure a clean setup. You must create your own venv as described in the setup instructions.