Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How to include stats.h5 of PWG Vocoder during ONXX conversion for TTS #94

Open
anirpipi opened this issue Jul 1, 2023 · 2 comments
Open

Comments

@anirpipi
Copy link

anirpipi commented Jul 1, 2023

Hi..
I am trying to convert pretrained LJSpeech TTS model based on kan-bayashi/ljspeech_fastspeech2 and parallel_wavegan/ljspeech_parallel_wavegan.v1 using the below code:

########################### ONNX Conversion ############################

from espnet2.bin.tts_inference import Text2Speech
from espnet_onnx.export import TTSModelExport

m = TTSModelExport()

tag_exp = "exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/train.loss.ave_5best.pth"
train_config="exp/tts_train_fastspeech2_raw_phn_tacotron_g2p_en_no_space/config.yaml"

vocoder_tag = 'parallel_wavegan.v1/checkpoint-400000steps.pkl'
vocoder_config= 'parallel_wavegan.v1/config.yml'

text2speech = Text2Speech.from_pretrained(
train_config=train_config,
model_file=tag_exp,
vocoder_file=vocoder_tag,
vocoder_config=vocoder_config,
speed_control_alpha=1.0,
always_fix_seed=False
)

tag_name = 'ljspeech_pretrained'
m.export(text2speech, tag_name, quantize=True)

########################### Inference ############################

from espnet_onnx import Text2Speech
import soundfile
import numpy as np
import time

text2speech = Text2Speech(tag_name)

text = 'hello world!'
wav = wav['wav']

soundfile.write("ljspeech_pretrained_test.wav", wav, 22050, "PCM_16")

######################################################################

On synthesizing, the audio quality is very low.
I realized that the converted ONNX folder did not have stats.h5 file from the pwg vocoder folder.
~/.cache/espnet_onnx/ljspeesch_pretrained/: config.yaml feats_stats.npz full quantize

Can anyone please help how to include the stats.h5 during inference using espnet_onnx

@Masao-Someki
Copy link
Collaborator

Hi @anirpipi, sorry for the late reply, and thank you for reporting the issue.
It may be a bug, so I would like to check this problem.
It seems you are using your own trained model, can you confirm that this issue still happens with the published models? If it's reproducible, I will download the model and investigate this.

@anirpipi
Copy link
Author

Hi..Thanks for the response.
Its the same case with pre-trained models also..
For VITS, its fine but for FastSpeech2+PWG, the problem occurs..
Can you please look into it once
Thanks in advance

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants