This is the project we implemented for Flipkart Grid 2.0 competition.
In this project, an input voice is given. The background noises are removed from the input. Then we apply the concept of speaker diarization to calculate the number of people in the input voice and found the primary speaker who is placing the order. After that, we segregated the voice of primary speaker and stored it in another file. Then we used Flipkart's Automated Speech Recognition (ASR) API to convert the voice of primary speaker into text with their confidence value.
If you want the whole project, just go to the below link and download the whole folder and simply run it. https://drive.google.com/drive/folders/1nyJBqqmVHAULwVF6p_li83m4XMIi07a8?usp=sharing