You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Liked your work in Transfer TTS and SC VITS. I have trained a model up to 350000 steps using LibriTTS train clean 100 dataset only but when I synthesize results using some random audio file the speech is not clear.
So, my question is:
How many steps did you train your model?
What should be the length (duration) of audio files while passing to inference.py.
Also should the reference audio be a part of the training data speaker, or can it be unseen?
Do you have any demo page where we can see the comparison of Transfer TTS generated audio with VITS?
Thanks
The text was updated successfully, but these errors were encountered:
Hello @hcy71o ,
Liked your work in Transfer TTS and SC VITS. I have trained a model up to 350000 steps using LibriTTS train clean 100 dataset only but when I synthesize results using some random audio file the speech is not clear.
So, my question is:
How many steps did you train your model?
What should be the length (duration) of audio files while passing to inference.py.
Also should the reference audio be a part of the training data speaker, or can it be unseen?
Do you have any demo page where we can see the comparison of Transfer TTS generated audio with VITS?
Thanks
The text was updated successfully, but these errors were encountered: