This project recognizes emotions from snippets of speech signals from the RAVDESS databases by using a Convolutional and LSTM network, in conjunction with Voice Activity Detection (VAD) and an extended feature set.
pip install librosa
pip install tensorflow
pip install -U scikit-learn
pip install soundfile
- S. R. Livingstone and F. A. Russo, “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English,” PLOS ONE, vol. 13, no. 5, May 2018, DOI: 10.1371/journal.pone.0196391.
- B. Mcfee et al., “librosa: Audio and Music Signal Analysis in Python,” 2015. [Online]. Available: https://www.youtube.com/watch?v=MhOdbtPhbLU