Transcriber Pal

Transcriber Pal is a Python command-line application that transcribes audio and video files to text using artificial intelligence. It uses the Whisper library for speech recognition and the FFmpeg library for audio and video processing.

How it works

sequenceDiagram
    participant User
    participant Transcriber
    participant Model
    User->>Transcriber: Provide audio file
    Transcriber->>Transcriber: Check for GPU
    Transcriber->>Model: Load transcription model
    Transcriber->>Transcriber: Create temp directories
    Transcriber->>Transcriber: Calculate audio duration
    User->>Transcriber: Request transcription
    Transcriber->>Transcriber: Split audio into chunks
    loop for each chunk
        Transcriber->>Model: Transcribe chunk to text
        Model-->>Transcriber: Return transcription
    end
    Transcriber->>Transcriber: Write transcriptions to file
    Transcriber->>Transcriber: Delete temp files
    Transcriber-->>User: Return transcription file

Requirements

Python 3.9 or higher
FFmpeg
Torch

You mostly just need to follow the instructions here on the Whisper Doc

You can install the Python dependencies using the following command:

pip install -r requirements.txt

To run Whisper on a GPU, you need to follow these steps:

Ensure that you have a compatible GPU: Whisper performs best when it has access to a GPU. Check if your system has a compatible GPU, such as an NVIDIA GPU with CUDA support.
Install the torch package with CUDA support: Whisper relies on the torch package, which offers a CUDA-accelerated version. To install the CUDA-enabled version of torch, run the following commands:

pip3 uninstall torch
pip cache purge
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117

Usage

To transcribe an audio or video file, run the main.py script and pass the path to the input file as a command-line argument:

python main.py 'path/to/file.mp3'

The time it takes to finish the transcript is roughly half the time of the audio, it can be a bit longer if there's a need to convert the video to audio.

The output text will be saved to a file in the output directory with the same name as the input file.

If the input file is a video file, the script will first convert it to an audio file using FFmpeg before transcribing it.

If no input file is provided, the script will display an error message.

Whisper available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

In case you want to change the model, you can do so by changing the model variable in the audio_to_text file.

model = whisper.load_model("base") # Model to use for transcription

Keep in mind that the larger the model, the more memory it will require to run and the longer it will take to transcribe the audio, but it will also be more accurate.

License

This project is licensed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.vscode		.vscode
node_modules		node_modules
tools		tools
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
fileinfo.db		fileinfo.db
input.txt		input.txt
main.py		main.py
monitor_folder.py		monitor_folder.py
readme.md		readme.md
requirements.txt		requirements.txt
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcriber Pal

How it works

Requirements

Usage

Whisper available models and languages

License

About

Releases

Packages

Languages

License

vasconceloscezar/transcriber-pal

Folders and files

Latest commit

History

Repository files navigation

Transcriber Pal

How it works

Requirements

Usage

Whisper available models and languages

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages