Vosk Server is an open source Voice-To-Text server based on Vosk-API, and provides real-time voice transcription over WebSocket (and other protocols).
This Python script is based off their test_microphone.py
example, acting as a client interface with a Vosk server. Currently, this version only adds OSC output of the transcription, but the plan is to expand this much further.
Please see the Vosk GitHub repo for details on the server and instructions on how to host your own: https://github.com/alphacep/vosk-server
-
An initial example script to connect to the Vosk server via websockets and output transcription results as dictionaries via OSC.
-
A Max patch to demonstrate extracting a reminder time (hour + am/pm) from a transcription received via OSC from the Python script. The approach uses native Max objects to do this, and results in a very convoluted patch.
Note: I in no way endorse the use of Max (especially with only their native patch object library) to perform text analysis. It's simply not designed for this. If Max is required, a far better approach would be to either perform the extraction prior posting the data to Max, or by using Max's Javascript
[js]
object.
- OSC transcription output.
- Python argument parser.
- Local Vosk server integration.
- Transcription of imported audio files.
- Webapp front-end:
- Flask / Bootstrap / SQLAlchemy stack.
- User authentication.
- Per-account transcription retention.
- Text analysis:
- Feature and keyword extraction
- Visualisation:
- Wordcloud, tables, charts
- D3.js?
If you're using Docker, it's as easy as:
docker run -d -p 2700:2700 alphacep/kaldi-en:latest
See the Vosk Server GitHub page for more info.
- Python3
- pyaudio
- websockets
- python-osc
Click for Linux setup instructions...
This assumes you have Python3
and pip
installed.
I had a fatal install error using the official pip install pyaudio
on Ubuntu 20.04. The following command worked perfectly instead:
sudo apt install portaudio19-dev python3-pyaudio
pip install websockets python-osc
git clone https://github.com/MaxVRAM/vosk_vtt_client.git
(Windows) Click for setup instructions...
This will work with other versions of Python, but I've only tested it with Python 3.10.0, so that's what I'll be using as an example.
- Head over to the Python Releases for Windows page and download Python 3.10.0 (64-bit) installer - or use this direct download link
- After it's done downloading and open the installer, make sure you check the
Add Python 3.10 to PATH
option at the bottom of the window, which makes the Python command accessable from any folder on your system. Then hit Install Now and wait for it to finish. - Open Windows command prompt by pressing
[win] + r
, entercmd
in the box and hit enter. - Check that Python is installed by entering
python -V
(with a capital V). It should print outPython 3.10.0
or whatever version you installed.
PyAudio is not a native package on Windows, so it needs to be manually downloaded and imported from a whl
wheel file.
- Download the PyAudio file that matches your Python version and OS - link.
- For example, Python 3.10.0 on Windows 10 (64-bit) would require:
PyAudio‑0.2.11‑cp310‑cp310‑win_amd64.whl
- Where
cp310
is Python 3.10.0, andwin_amd64
is Windows 64-bit).
- Move the file to your user's
Documents
folder. - Back in Windows command prompt, navigate to the Documents folder, using
cd Documents
if you're already in your user folder, otherwisecd C:\Users\<your_user_name>\Documents
. - Now install the module:
pip install PyAudio‑0.2.11‑cp310‑cp310‑win_amd64.whl
- And finally install
websockets
andpython-osc
:
pip install websockets python-osc
git clone https://github.com/MaxVRAM/vosk_vtt_client.git
- Or download the script directly
vtt_client.py
.
If your Vosk Server is running locally listening on the default port 2700
, you can simply run the script:
python3 vtt_client.py
-server <server_url>:<port>
- Defaults to
localhost:2700
A remote Vosk Server connection might look like this:
python3 vtt_client.py -server example.com:8089
-ip <osc_ip> -port <osc_port>
- Defaults to
localhost
and9600
Sending the OSC elsewhere might look like this:
python3 vtt_client.py -ip 192.168.40.22 -port 5110
A full example might look like this:
python3 vtt_client.py -server example:8098 -ip 192.168.40.22 -port 5110