Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[BUG]: the lengths of the features after FACodecEncoderV2 is not match #188

Open
Mahaotian1 opened this issue Apr 19, 2024 · 1 comment
Open
Assignees
Labels
bug Something isn't working

Comments

@Mahaotian1
Copy link

bug of FACodecEncoderV2

I have extracted prosody_feature and encoder_output from FACodecEncoderV2. It raise wrong when I use fa_decoder_v2 to extract vq codecs becaucse the lengths of prosody_feature(torch.Size([1, 20, 281])) and encoder_output(torch.Size([1, 256, 282])) is not same.

my code

wav_b = librosa.load(wav_b, sr=16000)[0]
wav_b = torch.from_numpy(wav_b).float()
wav_b = wav_b.unsqueeze(0).unsqueeze(0)
enc_out_b = fa_encoder_v2(wav_b)
prosody_b = fa_encoder_v2.get_prosody_feature(wav_b)
vq_post_emb_b, vq_id_b, _, quantized, spk_embs_b = fa_decoder_v2(
enc_out_b, prosody_b, eval_vq=False, vq=True
)

bug

File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/inference_codc.py", line 129, in
vq_post_emb_a, vq_id_a, _, quantized, spk_embs_a = fa_decoder_v2(
File "/home/data/mahaotian/anaconda3/envs/vallex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/data/mahaotian/anaconda3/envs/vallex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/facodec.py", line 1086, in forward
outs, qs, commit_loss, quantized_buf = self.quantize(
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/facodec.py", line 1048, in quantize
outs += out
RuntimeError: The size of tensor a (281) must match the size of tensor b (282) at non-singleton dimension 2

@HeCheng0625
Copy link
Collaborator

Hi, you need padding your wav length to multiples of 200 (hopsize)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants