呢個係粵文字幕生成器,輸入音頻文件(.mp3 .wav .webm .flac 等等)輸出.srt 字幕文件。
粵語轉寫用 FunAudioLLM/SenseVoiceSmall,時間點切分用fsmn-vad
。如果開啓 BERT 糾正器,用嘅係 hon9kon9ize/bert-large-cantonese。
將本 repo clone 落本地後,跑下面嘅命令嚟安裝依賴,然後下載必需嘅模型:
apt install ffmpeg
pip install -r requirements.txt
# 如果淨係用 OpenCC 就可以唔使下載
$ python download_models.py [--with-bert]
跟住準備好你需要轉寫嘅音頻文件,如果你想下載 YouTube 片音頻,可以裝 pip install yt-dlp
然後跑下面嘅命令嚟下載
# 呢條命令係單純下載音頻,冇視頻嘅,如果想要下載埋視頻就刪咗個 -f ba 佢
yt-dlp -f ba https://youtu.be/rIBD6A4lnLQ
跑下面嘅命令,將你嘅音頻文件轉寫成字幕,默認嘅糾正器係 opencc,如果你想用 bert 糾正器,可以加 --corrector=bert
,不過你需要喺第一步先導出 bert 模型,而且會需要更多時間
單獨轉寫一個文件可以直接跑
python cli.py audio.mp3 --output_dir output
如果唔特指某個文件而係成個路經,就會自動轉寫晒路經下所有嘅音頻:
# 自動轉寫晒所有 audio/ 入面嘅音頻
python cli.py ./audio/ --output_dir output
This service API used SenseVoice, VAD and Bert model to generate Cantonese subtitle transcript for audio file.
This is version only support Youtube video URL.
- Download audio file from Youtube video URL
- Use VAD model to split audio file into small audio clips
- Use SenseVoice model to generate Cantonese subtitle transcript and timestamp for each audio clip
- Since the output of SenseVoice model is Simplified Chinese, we use OpenCC to convert it to Traditional Chinese and then use Bert to correct the translation
- Generate SRT file for the Cantonese subtitle transcript
All model are exporting as ONNX format.
- SenseVoice: iic/SenseVoiceSmall(on ModelScope)
- VAD: iic/speech_fsmn_vad_zh-cn-16k-common-pytorch(on ModelScope)
- Bert: hon9kon9ize/bert-large-cantonese
sudo apt install ffmpeg
pip install -r requirements.txt
export models to ONNX format, it would download the model weights and export to ONNX format in models folder, you can add --with-bert
to export bert model
$ python download_models.py [--with-bert]
You can run the following command to download a YouTube audio. Make sure you have yt-dlp installed by pip install yt-dlp
.
# download audio file from youtube video url, if you want to download video as well, remove -f ba
yt-dlp -f ba https://youtu.be/rIBD6
run the cli, the default corrector is opencc, you can use bert as corrector by adding --corrector=bert
, but you need to export bert model in first step, and it would take more time to process
single file transcription can be run directly
$ python cli.py your_audio.mp3 --output-dir output [--corrector=opencc|bert]
or in batch
# Auto transcribe all audio files under the audio/ directory
python cli.py ./audio/ --output_dir output
or run the web API service
$ python app.py