Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug] Exception while using "--speaker_wav" #1440

Closed
lokeshhctm opened this issue Mar 24, 2022 · 3 comments · Fixed by #3275
Closed

[Bug] Exception while using "--speaker_wav" #1440

lokeshhctm opened this issue Mar 24, 2022 · 3 comments · Fixed by #3275
Assignees
Labels
bug Something isn't working

Comments

@lokeshhctm
Copy link

🐛 Description

(base) root@ip-192-168-0-200:/

/root/miniconda3/bin/tts --text "Awesome, Pretty Good" --model_name "tts_models/en/vctk/vits" --out_path "chunk11_encoded.wav" --speaker_wav "chunk10.wav"

tts_models/en/vctk/vits is already downloaded.
Using model: vits
Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:None
| > pitch_fmin:0.0
| > pitch_fmax:640.0
| > spec_gain:20.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:45
| > do_sound_norm:False
| > do_amp_to_db_linear:False
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
initialization of speaker-embedding layers.
Using Griffin-Lim as no vocoder model defined
Text: Awesome, Pretty Good
Text splitted to sentences.
['Awesome, Pretty Good']
Traceback (most recent call last):
File "/root/miniconda3/bin/tts", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.9/site-packages/TTS/bin/synthesize.py", line 287, in main
wav = synthesizer.tts(args.text, args.speaker_idx, args.language_idx, args.speaker_wav)
File "/root/miniconda3/lib/python3.9/site-packages/TTS/utils/synthesizer.py", line 245, in tts
speaker_embedding = self.tts_model.speaker_manager.compute_d_vector_from_clip(speaker_wav)
File "/root/miniconda3/lib/python3.9/site-packages/TTS/tts/utils/speakers.py", line 287, in compute_d_vector_from_clip
d_vector = _compute(wf)
File "/root/miniconda3/lib/python3.9/site-packages/TTS/tts/utils/speakers.py", line 270, in _compute
waveform = self.speaker_encoder_ap.load_wav(wav_file, sr=self.speaker_encoder_ap.sample_rate)
AttributeError: 'NoneType' object has no attribute 'load_wav'

Expected behavior

Environment

  • 🐸TTS Version (e.g., 1.3.0):
  • PyTorch Version (e.g., 1.8)
  • Python version:
  • OS (e.g., Linux):
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • How you installed PyTorch (conda, pip, source):
  • Any other relevant information:

Additional context

@lokeshhctm lokeshhctm added the bug Something isn't working label Mar 24, 2022
@WeberJulian
Copy link
Contributor

WeberJulian commented Mar 24, 2022

Hey, that's not a bug. The model tts_models/en/vctk/vits doesn't use an external speaker embedding, you can only use the speakers it was trained on. You can see thoses speakers here tts --model_name "tts_models/en/vctk/vits" --list_speaker_idx.

To use clone someone voice with --speaker_wav you can use YourTTS tts_models/multilingual/multi-dataset/your_tts

@WeberJulian WeberJulian self-assigned this Mar 24, 2022
@WeberJulian
Copy link
Contributor

If you have more questions about this, feel free to reopen the issue, or ask them on our Gitter.

@jreus
Copy link
Contributor

jreus commented May 8, 2022

Heya @WeberJulian -- maybe a more informative error message would be useful here? Since this isn't really an error - otherwise it looks like a bug

eginhard added a commit to idiap/coqui-ai-TTS that referenced this issue Nov 20, 2023
Fixes coqui-ai#1440. Passing a `speaker_wav` argument to regular Vits models failed
because they don't support voice cloning. Now that argument is simply ignored.
erogol pushed a commit that referenced this issue Nov 24, 2023
* Revert "fix for issue 3067"

This reverts commit 041b4b6.

Fixes #3143. The original issue (#3067) was people trying to use
tts.tts_with_vc_to_file() with XTTS and was "fixed" in #3109. But XTTS has
integrated VC and you can just do tts.tts_to_file(..., speaker_wav="..."), there
is no point in passing it through FreeVC afterwards. So, reverting this commit
because it breaks tts.tts_with_vc_to_file() for any model that doesn't have
integrated VC, i.e. all models this method is meant for.

* fix: support multi-speaker models in tts_with_vc/tts_with_vc_to_file

* fix: only compute spk embeddings for models that support it

Fixes #1440. Passing a `speaker_wav` argument to regular Vits models failed
because they don't support voice cloning. Now that argument is simply ignored.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants