From 9761b3094078f00cd5303321c9575a3ae2676684 Mon Sep 17 00:00:00 2001 From: Dapwner <46859435+Dapwner@users.noreply.github.com> Date: Mon, 3 Jun 2024 14:07:04 +0800 Subject: [PATCH] Update README.md --- README.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 8154905..1c3be06 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,12 @@ # Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training -This repo combines a Tacotron2 model with a ML-VAE and adversarial learning to target accent conversion in TTS settings (pick a speaker A with and assign them accent B). \ No newline at end of file +This repo combines a Tacotron2 model with a ML-VAE and adversarial learning to target accent conversion in TTS settings (pick a speaker A with and assign them accent B). +Paper link: TBA +Samples link: https://amaai-lab.github.io/Accented-TTS-MLVAE-ADV/ + +## Training +First preprocess your data into mel spectrogram .npy arrays with the preprocess.py script. We used L2CMU in this paper, which stands for a combination of L2Arctic (24 speakers) and CMUArctic (4 speakers). Then run CUDA_VISIBLE_DEVICES=X python train.py --dataset L2CMU + +## Inference +Once trained, you can run extract_stats.py to retrieve the accent and speaker embeddings of your evaluation set and store them. Then, you can synthesize with one of the synth scripts. :-) + +Once trained, you can run CUDA_VISIBLE_DEVICES=X python synthesize.py --dataset L2Arctic --restore_step [N] --mode [batch/single] --text [TXT] --speaker_id [SPID] --accent [ACC]