Skip to content

YatingMusic/ddsp-singing-vocoders

Repository files navigation

DDSP Singing Vocoders

Authors: Da-Yi Wu*, Wen-Yi Hsiao*, Fu-Rong Yang*, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang

*equal contribution

Paper | Demo

Official PyTorch Implementation of ISMIR2022 paper "DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation".

In this repository:

  • We propose a novel singing vocoders based on subtractive synthesizer: SawSing
  • We present a collection of different ddsp singing vocoders
  • We demonstrate that ddsp singing vocoders have relatively small model size but can generate satisfying results with limited resources (1 GPU, 3-hour training data). We also report the result of an even more stringent case training the vocoders with only 3-min training recordings for only 3-hour training time.

A. Installation

pip install -r requirements.txt 

B. Dataset

Please refer to dataset.md for more details.

C. Training

Train vocoders from scratch.

  1. Modify the configuration file ..config/<model_name>.yaml
  2. Run the following command:
# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml \
               --stage  training \
               --model SawSinSub
  1. Change --model argument to try different vocoders. Currently, we have 5 models: SawSinSub (Sawsing), Sins (DDSP-Add), DWS (DWTS), Full, SawSub. For more details, please refer to our documentation - DDSP Vocoders.

Our training resources: single Nvidia RTX 3090 Ti GPU

D. Validation

Run validation: compute loss and real-time factor (RTF).

  1. Modify the configuration file ..config/<model_name>.yaml
  2. Run the following command:
# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml  \
              --stage validation \
              --model SawSinSub \
              --model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
              --output_dir ./test_gen

E. Inference

Synthesize audio file from existed mel-spectrograms. The code and specfication for extracting mel-spectrograms can be found in preprocess.py.

# SawSing as an example
python main.py --config ./configs/sawsinsub.yaml  \
              --stage inference \
              --model SawSinSub \
              --model_ckpt ./exp/f1-full/sawsinsub-256/ckpts/vocoder_27740_70.0_params.pt \
              --input_dir  ./path/to/mel
              --output_dir ./test_gen

F. Post-Processing

In Sawsing, we found there are buzzing artifacts in the harmonic part singals, so we develop a post-processing codes to remove them. The method is simple yet effective --- applying a voiced/unvoiced mask. For more details, please refer to here.

G. More Information

H. Citation

@article{sawsing,
  title={DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation},
  author={Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang},
  journal = {Proc. International Society for Music Information Retrieval},
  year    = {2022},
}