Skip to content

Client for Vosk voice-to-text server, sending real-time transcriptions to remote OSC receiver.

Notifications You must be signed in to change notification settings

MaxVRAM/Vosk-VTT-Client

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vosk Voice To Text (VTT) client

Vosk Server is an open source Voice-To-Text server based on Vosk-API, and provides real-time voice transcription over WebSocket (and other protocols).

This Python script is based off their test_microphone.py example, acting as a client interface with a Vosk server. Currently, this version only adds OSC output of the transcription, but the plan is to expand this much further.

Please see the Vosk GitHub repo for details on the server and instructions on how to host your own: https://github.com/alphacep/vosk-server

Project files

  • vtt_client.py

    An initial example script to connect to the Vosk server via websockets and output transcription results as dictionaries via OSC.

  • vtt_reminder_example.maxpat

    A Max patch to demonstrate extracting a reminder time (hour + am/pm) from a transcription received via OSC from the Python script. The approach uses native Max objects to do this, and results in a very convoluted patch.

    Note: I in no way endorse the use of Max (especially with only their native patch object library) to perform text analysis. It's simply not designed for this. If Max is required, a far better approach would be to either perform the extraction prior posting the data to Max, or by using Max's Javascript [js] object.

Feature ideas

  • OSC transcription output.
  • Python argument parser.
  • Local Vosk server integration.
  • Transcription of imported audio files.
  • Webapp front-end:
    • Flask / Bootstrap / SQLAlchemy stack.
    • User authentication.
    • Per-account transcription retention.
  • Text analysis:
    • Feature and keyword extraction
  • Visualisation:
    • Wordcloud, tables, charts
    • D3.js?

Dependencies

Vosk Server

If you're using Docker, it's as easy as:

docker run -d -p 2700:2700 alphacep/kaldi-en:latest

See the Vosk Server GitHub page for more info.

Client modules

  • Python3
  • pyaudio
  • websockets
  • python-osc

Client setup guide

Linux

Click for Linux setup instructions...

This assumes you have Python3 and pip installed.

1. Install the Python modules

I had a fatal install error using the official pip install pyaudio on Ubuntu 20.04. The following command worked perfectly instead:

sudo apt install portaudio19-dev python3-pyaudio
pip install websockets python-osc

2. Clone the project

git clone https://github.com/MaxVRAM/vosk_vtt_client.git

Windows

(Windows) Click for setup instructions...

1. Install Python 3

This will work with other versions of Python, but I've only tested it with Python 3.10.0, so that's what I'll be using as an example.

  1. Head over to the Python Releases for Windows page and download Python 3.10.0 (64-bit) installer - or use this direct download link
  2. After it's done downloading and open the installer, make sure you check the Add Python 3.10 to PATH option at the bottom of the window, which makes the Python command accessable from any folder on your system. Then hit Install Now and wait for it to finish.
  3. Open Windows command prompt by pressing [win] + r, enter cmd in the box and hit enter.
  4. Check that Python is installed by entering python -V (with a capital V). It should print out Python 3.10.0 or whatever version you installed.

2. Install the Python modules

PyAudio is not a native package on Windows, so it needs to be manually downloaded and imported from a whl wheel file.

  1. Download the PyAudio file that matches your Python version and OS - link.
  • For example, Python 3.10.0 on Windows 10 (64-bit) would require:
    • PyAudio‑0.2.11‑cp310‑cp310‑win_amd64.whl
    • Where cp310 is Python 3.10.0, and win_amd64 is Windows 64-bit).
  1. Move the file to your user's Documents folder.
  2. Back in Windows command prompt, navigate to the Documents folder, using cd Documents if you're already in your user folder, otherwise cd C:\Users\<your_user_name>\Documents.
  3. Now install the module:
pip install PyAudio‑0.2.11‑cp310‑cp310‑win_amd64.whl
  1. And finally install websockets and python-osc:
pip install websockets python-osc

3. Clone the project

git clone https://github.com/MaxVRAM/vosk_vtt_client.git

Usage

If your Vosk Server is running locally listening on the default port 2700, you can simply run the script:

python3 vtt_client.py

Arguments

Vosk Server connection

  • -server <server_url>:<port>
  • Defaults to localhost:2700

A remote Vosk Server connection might look like this:

python3 vtt_client.py -server example.com:8089

OSC destination

  • -ip <osc_ip> -port <osc_port>
  • Defaults to localhost and 9600

Sending the OSC elsewhere might look like this:

python3 vtt_client.py -ip 192.168.40.22 -port 5110

Putting it together

A full example might look like this:

python3 vtt_client.py -server example:8098 -ip 192.168.40.22 -port 5110

About

Client for Vosk voice-to-text server, sending real-time transcriptions to remote OSC receiver.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published