Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Vosk-server and vtt_client.py sample mismatch #235

Open
echoTab opened this issue Aug 15, 2023 · 4 comments
Open

Vosk-server and vtt_client.py sample mismatch #235

echoTab opened this issue Aug 15, 2023 · 4 comments

Comments

@echoTab
Copy link

echoTab commented Aug 15, 2023

I have vosk-server running on a VPS server under Ubuntu 22.04, cloned from https://github.com/alphacep/vosk-server. And I have vtt_client.py running on Windows 10 via WSL2/Ubuntu, cloned from https://github.com/MaxVRAM/vosk_vtt_client.git. Lots of problems getting pyaudio to work but finally got it to run after installing via conda (although vtt_client still thros many ASLA lib errors).

However when I start vosk-server and then vtt_client I get "sampling frequency mismatch, expected 16000, got 8000". I have tried hard coding vosk-server to 8k, and also tried hard coding vtt_client to 16K. Neither of these changed the error message. Also tried running the server with --allow_{upsample,downsample} but this did not help either.

Run out of ideas, are you able to help?

@nshmyrev
Copy link
Contributor

What model are you running on the server?

To change everything to 16khz, you need to change both server:

https://github.com/alphacep/vosk-server/blob/master/websocket/asr_server.py#L95

and client

https://github.com/MaxVRAM/Vosk-VTT-Client/blob/main/vtt_client.py#L61

In general, we recommend sounddevice for microphone recording, we do not recommend pyaudio. We also recommend to use our examples instead of external projects.

@echoTab
Copy link
Author

echoTab commented Aug 17, 2023

Thanks for your reply. I have been running with vosk-model-small-en-us-0.15, which I understand requires 16k sample rate. This may be a dumb question but looking at the code of asr_server.py I realise that maybe I have been confused between 'model' and 'spk_model'. Could you please explain how these differ?

@nshmyrev
Copy link
Contributor

spk_model is for voice recognition (speaker identity).

@echoTab
Copy link
Author

echoTab commented Aug 17, 2023

Thank you. It is working now.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Development

No branches or pull requests

2 participants