Course Project done in Speech Processing Course.
The chatbot has 2 features:
- Question and Answer
- Speaker Recognition
Option 1: QnA The overall work it does is:
- Receive audio input from user. Convert the audio array into .wav file.
- Speech recognition of .wav file through get_large_audio_transcription(). [Converting it into text form]
- Searching for its answers from the processed Corpus, and returning the output.
Corupus used: intro.txt file
Option 2: Speaker Recognition
Work: Input number of samples, chatbot takes the input and chooses the samples randomly and recognises speakers.
Objective: Classify speakers from the frequency domain representation of speech recordings, obtained via Fast Fourier Transform (FFT).
Dataset: https://www.kaggle.com/kongaevans/speaker-recognition-dataset 5 speakers: (each 1500 audio files, each 1 second long and sampled at 16000 Hz) Background noise (2 folders, 6 files, longer than 1 sec, need to resample them to 16k Hz)
Procedure used:
- Prepare a dataset of speech samples from different speakers, with the speaker as a label.
- Add background noise to these samples to augment our data.
- Take the FFT of these samples.
- Train a 1D convnet to predict the correct speaker given a noisy FFT speech sample