F5-TTS

Demo; Paper

Official code for "A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Installation

pip install -r requirements.txt

Dataset

# prepare custom dataset up to your need
# download corresponding dataset first, and fill in the path in scripts
python scripts/prepare_emilia.py
python scripts/prepare_wenetspeech4tts.py

Training

# setup accelerate config, e.g. use multi-gpu ddp, fp16
# will be to: ~/.cache/huggingface/accelerate/default_config.yaml     
accelerate config
accelerate launch test_train.py

Inference

Pretrained model ckpts. https://huggingface.co/SWivid/F5-TTS

# test single inference
# modify the config up to your need,
# e.g. fix_duration (the total length of prompt + to_generate, currently support up to 30s)
#      nfe_step     (larger takes more time to do more precise inference ode)
#      ode_method   (switch to 'midpoint' for better compatibility with small nfe_step, )
#                   ( though 'midpoint' is 2nd-order ode solver, slower compared to 1st-order 'Euler')
python test_infer_single.py

# test speech edit
python test_infer_single_edit.py

Evaluation

download seedtts testset. https://github.com/BytedanceSpeech/seed-tts-eval
download test-clean. http://www.openslr.org/12/
uzip and place under data/, and fill in the path of test-clean in test_infer_batch.py
our librispeech-pc 4-10s subset is already under data/ in this repo

zh asr model ckpt. https://huggingface.co/funasr/paraformer-zh
en asr model ckpt. https://huggingface.co/Systran/faster-whisper-large-v3
wavlm model ckpt. https://drive.google.com/file/d/1-aE1NfzpRCLxA4GUxX9ITI3F9LlbtEGP/view
fill in the path of ckpts in test_infer_batch.py

# batch inference for evaluations
accelerate config  # if not set before
bash test_infer_batch.sh

faster-whisper if cuda11,
pip install --force-reinstall ctranslate2==3.24.0
(recommended) pip install faster-whisper==0.10.1,
otherwise may encounter asr failure (output abnormal repetition)

# evaluation for Seed-TTS test set
python scripts/eval_seedtts_testset.py

# evaluation for LibriSpeech-PC test-clean cross sentence
python scripts/eval_librispeech_test_clean.py

Appreciation

E2-TTS brilliant work, simple and effective
Emilia, WenetSpeech4TTS valuable datasets
lucidrains initial CFM structure with also bfs18 for discussion
SD3 & Huggingface diffusers DiT and MMDiT code structure
FunASR, faster-whisper & UniSpeech for evaluation tools
torchdiffeq as ODE solver, Vocos as vocoder
ctc-forced-aligner for speech edit test

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

F5-TTS

Demo; Paper

Installation

Dataset

Training

Inference

Evaluation

Appreciation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
ckpts		ckpts
data		data
model		model
scripts		scripts
tests/ref_audio		tests/ref_audio
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_infer_batch.py		test_infer_batch.py
test_infer_batch.sh		test_infer_batch.sh
test_infer_single.py		test_infer_single.py
test_infer_single_edit.py		test_infer_single_edit.py
test_train.py		test_train.py

License

wbjnpu/F5-TTS

Folders and files

Latest commit

History

Repository files navigation

F5-TTS

Demo; Paper

Installation

Dataset

Training

Inference

Evaluation

Appreciation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages