Skip to content

Getting Started & Tips

Fauzan F A edited this page Dec 25, 2023 · 17 revisions

Before using the application, please take a look at the article below:

❓ How To Use The App

Here are some guide containing visual instruction and information on how to use the app

πŸ‘€ Your mouse/pointer may reveal information/hint

A question mark ❓ will shows up when hovering over the widgets in the app to indicate that there is a tooltip containing hint for that certain widget

image

πŸ‘€ Overview of the Main Window

image

πŸ‘€ Overview of the setting window

image

❓ If you want to transcribe/translate speech live you can open a record session by click the record button

πŸ–±οΈ When you click the record button this window will pop up

image

❓ You can also show the results of the live recording session in a subtitle window by πŸ‘† clicking on the menubar -> show

image

❓ This window can also be made transparent (on windows only) by right clicking and select transparent

image

πŸ‘€ The result will be somewhat like this

image

File operation

πŸ–±οΈ Clicking import will open this menu to import files

image

πŸ–±οΈ Cliking on tools on the other hand will open up selection of tools to choose from

image

πŸ–±οΈ Cliking at align results will open this window

This can be used to aligned/synced Audio with plain text or json result on word-level.

image

πŸ–±οΈ Cliking at refine results will open this window

This can be used to be further improved timestamps.

image

πŸ–±οΈ Cliking at translate results will open this window

This can be used to translate result that you get from transcribing a file.

image

πŸ“Œ Usage Tips

  • you can disable the tray app of the app by adding --no-tray to the parameter to when launching the app. Example: .\SpeechTranslate.exe --no-tray
  • you can use the application fully offline if you only use whisper or if you have set up libre translate locally in your machine and set it in translate setting
  • you can remove the limit of the recorded sentence by enabling the set no limit to sentences in the setting -> device -record
  • use tiny, base, or small model for record session to avoid crashing on low spec hardware
  • enable the usage of faster whisper model if you want to use larger model on record session
  • enable the usage of faster whisper model for faster result
  • do not disable word timestamps with word_timestamps=False for reliable segment timestamps
  • use --vad True for more accurate non-speech detection
  • use --demucs True to isolate vocals with Demucs; it is also effective at isolating vocals even if there is no music
  • use --demucs True and --vad True for music
  • set --dq True for usage without faster whisper to enable dynamic quantization for inference on CPU
  • enable visualize_suppression to visualize the differences between non-VAD and VAD options
  • if the non-speech/silence seems to be detected but the starting timestamps do not reflect that, then try to set --min_word_dur 0
  • Refining result is a great alternative to silence suppression (e.g. if VAD isn't effective)

* Some of these tips are taken directly from stable-ts page

πŸ“œ Requirements

Make sure that your machine meets the following requirement:

  • Compatible OS Installation:
OS Installation from Prebuilt binary Installation as a Module Installation from Git
Windows βœ”οΈ βœ”οΈ βœ”οΈ
MacOS ❌ βœ”οΈ βœ”οΈ
Linux ❌ βœ”οΈ βœ”οΈ

* Python 3.8 or later (3.11 is recommended) for installation as module.

Size Parameters Required VRAM Relative speed
tiny 39 M ~1 GB ~32x
base 74 M ~1 GB ~16x
small 244 M ~2 GB ~6x
medium 769 M ~5 GB ~2x
large 1550 M ~10 GB 1x

* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the model speed will be significantly faster and have smaller vram usage, for more information about this please visit faster-whisper repository

πŸ”§ Installation

In general, you can install speech translate with these methods:

From Prebuilt Binary (.exe)

Note

The prebuilt binary is shipped with CUDA 11.8, so it will only work with GPU that has CUDA 11.8 compatibility. If your GPU is not compatible, you can try installation as module or from git below.

  1. Download the latest release (There are 2 versions, CPU and GPU)
  2. Install/extract the downloaded file
  3. Run the program
  4. Set the settings to your liking
  5. Enjoy!

As A Module

Note

Use python 3.11 for best compatibility and performance

Warning

You might need to have Build tools for Visual Studio (or the equivalent of it on your OS) installed

To install as module, we can use pip, with the following command.

  • Install with GPU (Cuda compatible) support:

    pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git --extra-index-url https://download.pytorch.org/whl/cu118

    cu118 here means CUDA 11.8, you can change it to other version if you need to. You can check older version of pytorch here or here.

  • CPU only:

    pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git

You can then run the program by typing speech-translate in your terminal/console. Alternatively, when installing as a module, you can also clone the repo and install it locally by running pip install -e . in the project directory. (Don't forget to add --extra-index-url if you want to install with GPU support)

Notes For Installation as Module:

  • If you are updating from an older version, you need to add --upgrade --force-reinstall at the end of the command, if the update does not need new dependencies you can add --no-deps at the end of the command to speed up the installation process.
  • If you want to install from a specific branch or commit, you can do it by adding @branch_name or @commit_hash at the end of the url. Example: pip install -U git+https://github.com/Dadangdut33/Speech-Translate.git@dev --extra-index-url https://download.pytorch.org/whl/cu118
  • The --extra-index-url here is for the version of CUDA. If your device is not compatible or you need to use other version of CUDA you can check older version of pytorch here or here.

Git cloning

If you prefer cloning the app directly from git, you can follow the guide in development instead. Doing it this way might also provide a more stable environment.