Expressive-FastSpeech2 - PyTorch Implementation

Contributions

Non-autoregressive Expressive TTS: This project aims to provide a cornerstone for future research and application on a non-autoregressive expressive TTS including Emotional TTS and Conversational TTS. For datasets, AIHub Multimodal Video AI datasets and IEMOCAP database are picked for Korean and English, respectively.

Note: If you are interested in GST-Tacotron or VAE-Tacotron like expressive stylistic TTS model but under non-autoregressive decoding, you may also be interested in STYLER [demo, code].
Annotated Data Processing: This project shed light on how to handle the new dataset, even with a different language, for the successful training of non-autoregressive emotional TTS.
English and Korean TTS: In addition to English, this project gives a broad view of treating Korean for the non-autoregressive TTS where the additional data processing must be considered under the language-specific features (e.g., training Montreal Forced Aligner with your own language and dataset). Please closely look into text/.
Adopting Own Language: For those who are interested in adapting other languages, please refer to the "Training with your own dataset (own language)" section of the categorical branch.

Repository Structure

In this project, FastSpeech2 is adapted as a base non-autoregressive multi-speaker TTS framework, so it would be helpful to read the paper and code first (Also see FastSpeech2 branch).

Emotional TTS: Following branches contain implementations of the basic paradigm intorduced by Emotional End-to-End Neural Speech synthesizer.
- categorical branch: only conditioning categorical emotional descriptors (such as happy, sad, etc.)
- continuous branch: conditioning continuous emotional descriptors (such as arousal, valence, etc.) in addition to categorical emotional descriptors
Conversational TTS: Following branch contains implementation of Conversational End-to-End TTS for Voice Agent
- conversational branch: conditioning chat history

Citation

If you would like to use or refer to this implementation, please cite the repo.

@misc{lee2021expressive_fastspeech2,
  author = {Lee, Keon},
  title = {Expressive-FastSpeech2},
  year = {2021},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/keonlee9420/Expressive-FastSpeech2}}
}

References

ming024's FastSpeech2 (Later than 2021.02.26 ver.)
HGU-DLLAB's Korean-FastSpeech2-Pytorch
hccho2's Tacotron2-Wavenet-Korean-TTS
carpedm20' multi-speaker-tacotron-tensorflow

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
audio		audio
config		config
hifigan		hifigan
img		img
model		model
preparation		preparation
preprocessor		preprocessor
text		text
transformer		transformer
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
evaluate.py		evaluate.py
prepare_align.py		prepare_align.py
prepare_data.py		prepare_data.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
synthesize.py		synthesize.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Expressive-FastSpeech2 - PyTorch Implementation

Contributions

Repository Structure

Citation

References

About

Releases

Packages

Languages

License

keonlee9420/Expressive-FastSpeech2

Folders and files

Latest commit

History

Repository files navigation

Expressive-FastSpeech2 - PyTorch Implementation

Contributions

Repository Structure

Citation

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages