Getting Started & Tips

Before using the application, please take a look at the article below:

❓ How To Use The App

Here are some guide containing visual instruction and information on how to use the app

👀 Your mouse/pointer may reveal information/hint

A question mark ❓ will shows up when hovering over the widgets in the app to indicate that there is a tooltip containing hint for that certain widget

👀 Overview of the Main Window

👀 Overview of the setting window

❓ If you want to transcribe/translate speech live you can open a record session by click the record button

🖱️ When you click the record button this window will pop up

❓ You can also show the results of the live recording session in a subtitle window by 👆 clicking on the menubar -> show

❓ This window can also be made transparent (on windows only) by right clicking and select transparent

👀 The result will be somewhat like this

File operation

🖱️ Clicking import will open this menu to import files

**🖱️ Cliking on tools on the other hand will open up selection of tools to choose from**

🖱️ Cliking at align results will open this window

This can be used to aligned/synced Audio with plain text or json result on word-level.

🖱️ Cliking at refine results will open this window

This can be used to be further improved timestamps.

🖱️ Cliking at translate results will open this window

This can be used to translate result that you get from transcribing a file.

📌 Usage Tips

you can disable the tray app of the app by adding --no-tray to the parameter to when launching the app. Example: .\SpeechTranslate.exe --no-tray
you can use the application fully offline if you only use whisper or if you have set up libre translate locally in your machine and set it in translate setting
you can remove the limit of the recorded sentence by enabling the set no limit to sentences in the setting -> device -record
use tiny, base, or small model for record session to avoid crashing on low spec hardware
enable the usage of faster whisper model if you want to use larger model on record session
enable the usage of faster whisper model for faster result
do not disable word timestamps with word_timestamps=False for reliable segment timestamps
use --vad True for more accurate non-speech detection
use --demucs True to isolate vocals with Demucs; it is also effective at isolating vocals even if there is no music
use --demucs True and --vad True for music
set --dq True for usage without faster whisper to enable dynamic quantization for inference on CPU
enable visualize_suppression to visualize the differences between non-VAD and VAD options
if the non-speech/silence seems to be detected but the starting timestamps do not reflect that, then try to set --min_word_dur 0
Refining result is a great alternative to silence suppression (e.g. if VAD isn't effective)

* Some of these tips are taken directly from stable-ts page

📜 Requirements

Make sure that your machine meets the following requirement:

Compatible OS Installation:

OS	Installation from Prebuilt binary	Installation as a Module	Installation from Git
Windows	✔️	✔️	✔️
MacOS	❌	✔️	✔️
Linux	❌	✔️	✔️

* Python 3.8 or later (3.11 is recommended) for installation as module.

Speaker input only work on windows 8 and above (Alternatively, you can make a loopback to capture your system audio as virtual input (like mic input) by using this guide/tool: [Voicemeeter on Windows]/[YT Tutorial] - [pavucontrol on Ubuntu with PulseAudio] - [blackhole on MacOS])
Internet connection is needed only for translation with API & downloading models (If you want to go fully offline, you can setup LibreTranslate on your local machine and set it up in the app settings)
Recommended to have Segoe UI font installed on your system for best UI experience (For OS other than windows, you can see this: Ubuntu - MacOS)
Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) for faster result. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.

Size	Parameters	Required VRAM	Relative speed
tiny	39 M	~1 GB	~32x
base	74 M	~1 GB	~16x
small	244 M	~2 GB	~6x
medium	769 M	~5 GB	~2x
large	1550 M	~10 GB	1x

* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository

🔧 Installation

In general, you can install speech translate with these methods:

From Prebuilt Binary (.exe)

Note

The prebuilt binary is shipped with CUDA 11.8, so it will only work with GPU that has CUDA 11.8 compatibility. If your GPU is not compatible, you can try installation as module or from git below.

Download the latest release (There are 2 versions, CPU and GPU)
Install/extract the downloaded file
Run the program
Set the settings to your liking
Enjoy!

As A Module

Note

Use python 3.11 for best compatibility and performance

Warning

You might need to have Build tools for Visual Studio (or the equivalent of it on your OS) installed

To install as module, we can use pip, with the following command.

Install with GPU (Cuda compatible) support:

pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --extra-index-url https://download.pytorch.org/whl/cu118

cu118 here means CUDA 11.8, you can change it to other version if you need to. You can check older version of pytorch here or here.
CPU only:

pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git

You can then run the program by typing speech-translate in your terminal/console. Alternatively, when installing as a module, you can also clone the repo and install it locally by running pip install -e . in the project directory. (Don't forget to add --extra-index-url if you want to install with GPU support)

Notes For Installation as Module:

If you are updating from an older version, you need to add --upgrade --force-reinstall at the end of the command, if the update does not need new dependencies you can add --no-deps at the end of the command to speed up the installation process.
If you want to install from a specific branch or commit, you can do it by adding @branch_name or @commit_hash at the end of the url. Example: pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git@dev --extra-index-url https://download.pytorch.org/whl/cu118
The --extra-index-url here is for the version of CUDA. If your device is not compatible or you need to use other version of CUDA you can check older version of pytorch here or here.

Git cloning

If you prefer cloning the app directly from git, you can follow the guide in development instead. Doing it this way might also provide a more stable environment.

Go to: Download Page - Wiki Home - Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly