PolyLangVITS

PolyLangVITS

Multilingual Speech Synthesis System Using VITS

Prerequisites

A Windows/Linux system with a minimum of 16GB RAM.
A GPU with at least 12GB of VRAM.
Python == 3.8
Anaconda installed.
PyTorch installed.
CUDA 11.x installed.
Zlib DLL installed.

Pytorch install command:

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

CUDA 11.7 install: https://developer.nvidia.com/cuda-11-7-0-download-archive

Zlib DLL install: https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-zlib-windows

Install pyopenjtalk Manually: pip install -U pyopenjtalk --no-build-isolation

If this command does not install, please install the following library before proceeding: cmake Cython

Installation

Create an Anaconda environment:

conda create -n polylangvits python=3.8

Activate the environment:

conda activate polylangvits

Clone this repository to your local machine:

git clone https://github.com/ORI-Muchim/PolyLangVITS.git

Navigate to the cloned directory:

cd PolyLangVITS

Install the necessary dependencies:

pip install -r requirements.txt

Prepare_Datasets

Place the audio files as follows.

.mp3 or .wav files are okay.

You must write '[language code]' on the back of the speaker folder.

PolyLangVITS
├────datasets
│       ├───speaker0[KO]
│       │   ├────1.mp3
│       │   └────1.wav
│       └───speaker1[JA]
│       │    ├───1.mp3
│       │    └───1.wav
│       ├───speaker2[EN]
│       │   ├────1.mp3
│       │   └────1.wav
│       ├───speaker3[ZH]
│       │   ├────1.mp3
│       │   └────1.wav
│       ├integral.py
│       └integral_low.py
│
├────vits
├────get_pretrained_model.py
├────inference.py
├────main_low.py
├────main_resume.py
├────main.py
├────Readme.md
└────requirements.txt

This is just an example, and it's okay to add more speakers.

Usage

To start this tool, use the following command, replacing {language}, {model_name}, and {sample_rate} with your respective values:

python main.py {language} {model_name} {sample_rate}

For those with low specifications(VRAM < 12GB), please use this code:

python main_low.py {language} {model_name} {sample_rate}

If the data configuration is complete and you want to resume training, enter this code:

python main_resume.py {model_name}

Inference

After the model has been trained, you can generate predictions by using the following command, replacing {model_name} and {model_step} with your respective values:

python inference.py {model_name} {model_step}

For text to speech inference, use the following:

python inference-stt.py {model_name} {model_step}

Also, you may manually pass the text without editing the code by:

python inference-stt.py {model_name} {model_step} {text}

References

For more information, please refer to the following repositories:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PolyLangVITS

Table of Contents

Prerequisites

Installation

Prepare_Datasets

Usage

Inference

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
datasets		datasets
vits		vits
.gitignore		.gitignore
LICENSE		LICENSE
Readme.md		Readme.md
get_pretrained_model.py		get_pretrained_model.py
inference-stt.py		inference-stt.py
inference.py		inference.py
main.py		main.py
main_low.py		main_low.py
main_resume.py		main_resume.py
requirements.txt		requirements.txt

License

lamquangtuong/PolyLangVITS

Folders and files

Latest commit

History

Repository files navigation

PolyLangVITS

Table of Contents

Prerequisites

Installation

Prepare_Datasets

Usage

Inference

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages