-
-
Notifications
You must be signed in to change notification settings - Fork 68
Getting Started & Tips
Before using the application, please take a look at the article below:
Here are some guide containing visual instruction and information on how to use the app
A question mark β will shows up when hovering over the widgets in the app to indicate that there is a tooltip containing hint for that certain widget
β If you want to transcribe/translate speech live you can open a record session by click the record button
π±οΈ When you click the record button this window will pop up
β You can also show the results of the live recording session in a subtitle window by π clicking on the menubar -> show
β This window can also be made transparent (on windows only) by right clicking and select transparent
π The result will be somewhat like this
This can be used to aligned/synced Audio with plain text or json result on word-level.
This can be used to be further improved timestamps.
This can be used to translate result that you get from transcribing a file.
- you can disable the tray app of the app by adding
--no-tray
to the parameter to when launching the app. Example:.\SpeechTranslate.exe --no-tray
- you can use the application fully offline if you only use
whisper
or if you have set uplibre translate
locally in your machine and set it in translate setting - you can remove the limit of the recorded sentence by enabling the
set no limit to sentences
in thesetting -> device -record
- use
tiny
,base
, orsmall
model for record session to avoid crashing on low spec hardware - enable the usage of
faster whisper
model if you want to use larger model on record session - enable the usage of
faster whisper
model for faster result - do not disable word timestamps with
word_timestamps=False
for reliable segment timestamps - use
--vad True
for more accurate non-speech detection - use
--demucs True
to isolate vocals with Demucs; it is also effective at isolating vocals even if there is no music - use
--demucs True
and--vad True
for music - set
--dq True
for usage without faster whisper to enable dynamic quantization for inference on CPU - enable
visualize_suppression
to visualize the differences between non-VAD and VAD options - if the non-speech/silence seems to be detected but the starting timestamps do not reflect that, then try to set
--min_word_dur 0
- Refining result is a great alternative to silence suppression (e.g. if VAD isn't effective)
* Some of these tips are taken directly from stable-ts page
Make sure that your machine meets the following requirement:
- Compatible OS Installation:
OS | Installation from Prebuilt binary | Installation as a Module | Installation from Git |
---|---|---|---|
Windows | βοΈ | βοΈ | βοΈ |
MacOS | β | βοΈ | βοΈ |
Linux | β | βοΈ | βοΈ |
* Python 3.8 or later (3.11 is recommended) for installation as module.
- Speaker input only work on windows 8 and above (Alternatively, you can make a loopback to capture your system audio as virtual input (like mic input) by using this guide/tool: [Voicemeeter on Windows]/[YT Tutorial] - [pavucontrol on Ubuntu with PulseAudio] - [blackhole on MacOS])
- Internet connection is needed only for translation with API & downloading models (If you want to go fully offline, you can setup LibreTranslate on your local machine and set it up in the app settings)
-
Recommended to have
Segoe UI
font installed on your system for best UI experience (For OS other than windows, you can see this: Ubuntu - MacOS) - Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) for faster result. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.
Size | Parameters | Required VRAM | Relative speed |
---|---|---|---|
tiny | 39 M | ~1 GB | ~32x |
base | 74 M | ~1 GB | ~16x |
small | 244 M | ~2 GB | ~6x |
medium | 769 M | ~5 GB | ~2x |
large | 1550 M | ~10 GB | 1x |
* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository
In general, you can install speech translate
with these methods:
Note
The prebuilt binary is shipped with CUDA 11.8, so it will only work with GPU that has CUDA 11.8 compatibility. If your GPU is not compatible, you can try installation as module or from git below.
- Download the latest release (There are 2 versions, CPU and GPU)
- Install/extract the downloaded file
- Run the program
- Set the settings to your liking
- Enjoy!
Note
Use python 3.11 for best compatibility and performance
Warning
You might need to have Build tools for Visual Studio (or the equivalent of it on your OS) installed
To install as module, we can use pip, with the following command.
-
Install with GPU (Cuda compatible) support:
pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --extra-index-url https://download.pytorch.org/whl/cu118
cu118 here means CUDA 11.8, you can change it to other version if you need to. You can check older version of pytorch here or here.
-
CPU only:
pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git
You can then run the program by typing speech-translate
in your terminal/console. Alternatively, when installing as a module, you can also clone the repo and install it locally by running pip install -e .
in the project directory. (Don't forget to add --extra-index-url
if you want to install with GPU support)
Notes For Installation as Module:
- If you are updating from an older version, you need to add
--upgrade --force-reinstall
at the end of the command, if the update does not need new dependencies you can add--no-deps
at the end of the command to speed up the installation process. - If you want to install from a specific branch or commit, you can do it by adding
@branch_name
or@commit_hash
at the end of the url. Example:pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git@dev --extra-index-url https://download.pytorch.org/whl/cu118
- The --extra-index-url here is for the version of CUDA. If your device is not compatible or you need to use other version of CUDA you can check older version of pytorch here or here.
If you prefer cloning the app directly from git, you can follow the guide in development instead. Doing it this way might also provide a more stable environment.
Go to: Download Page - Wiki Home - Code