-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Tensor size mismatch when running CAD2 > Task2 baseline enhancement [BUG] #401
Comments
Hi @awagenknecht , Thank you |
Hi @groadabike, thanks for looking into it. I'm new here, so I apologize if I'm missing something obvious. I still get the error when running the baseline as provided. I am loading noncausal models. I can avoid the error if I change the The parameters in the |
Hi @awagenknecht, I am not sure why you are getting that error. As I understand, when you use the HuggingFace's If you see the Whisper documentation in Huggingface https://huggingface.co/docs/transformers/en/model_doc/whisper, you can load the Can you confirm that your def load_separation_model(
causality: str, device: torch.device, force_redownload: bool = True
) -> dict[str, ConvTasNetStereo]:
"""
Load the separation model.
Args:
causality (str): Causality of the model (causal or noncausal).
device (torch.device): Device to load the model.
force_redownload (bool): Whether to force redownload the model.
Returns:
model: Separation model.
"""
models = {}
causal = {"causal": "Causal", "noncausal": "NonCausal"}
for instrument in [
"Bassoon",
"Cello",
"Clarinet",
"Flute",
"Oboe",
"Sax",
"Viola",
"Violin",
]:
logger.info(
"Loading model "
f"cadenzachallenge/ConvTasNet_{instrument}_{causal[causality]}"
)
models[instrument] = ConvTasNetStereo.from_pretrained(
f"cadenzachallenge/ConvTasNet_{instrument}_{causal[causality]}",
force_download=force_redownload,
).to(device)
return models |
Now, looking more in details the error message, it look a bit odd.
Tha call is in line 279 and it should include 3 arguments separation_models = load_separation_model(
config.separator.causality, device, config.separator.force_redownload
) The next error line says:
The line number is correct, 204. models[instrument] = ConvTasNetStereo.from_pretrained(
f"cadenzachallenge/ConvTasNet_{instrument}_{causal[causality]}",
force_download=force_redownload,
).to(device) Does your baseline code include these arguments? |
Yes, the You're right, there are some mistakes in the error message. I was using ChatGPT to help debug and must have gotten mixed up which version of the error I was copying. facepalm Sorry for the confusion. I'll edit the original post with the correct error message. The main content of the error is the same, though. It looks like all the arguments are being passed. I can see that the config.json is being downloaded from HuggingFace and cached with the correct parameters, but for some reason they are not being applied in the model initialization. This makes me think the issue is somewhere in the HuggingFace package on my end. So I don't think it's a Clarity bug anymore, and this can be closed if no one else is getting this error. |
HI @awagenknecht , It would be very helpful to find the source of the error. |
Updating
I started from a clean conda environment, but I believe I may have overlooked a permissions issue when installing clarity that caused it to use packages already installed in my base environment. That said, does clarity have a Thanks so much for helping track this down! |
Hi @awagenknecht , Thank you for let us know. Regarding the requirement, thank you for letting us know. We have a extra requirements.txt in task one but not in task2. |
Describe the bug
After downloading and processing the data for the CAD2 > Task2 challenge, I attempted to run
enhance.py
to explore the baseline enhancement system. This resulted in tensor size mismatch errors on theload_separation_model()
step (line 279). The parameters defined in theConvTasNetStereo
class do not match the pre-trained model checkpoint that is being loaded. I addressed the problem by changing three parameters in theConvTasNetStereo
definition inConvTasNet/local/tasnet.py
. I changed the following:audio_channels=2
- This fixes the size mismatches in the encoder and decoder models.X=10
andC=2
- These fix the size mismatch in the separator model. I am not sure if both changes are needed. I changed both because I noticed they did not match the configuration inConvTasNet/local/conf.yml
.To Reproduce
config.yaml
, set thezenodo_download_path
androot
path.process_dataset/process_zenodo_download.py
.enhance.py
with default parameters.Expected behavior
Based on the README in the baseline folder, I expect the baseline enhancement system to run with the default parameters and generate the enhanced .flac files.
Error Messages
File "/home/austin/clarity/recipes/cad2/task2/baseline/enhance.py", line 204, in load_separation_model
models[instrument] = ConvTasNetStereo.from_pretrained(
File "/home/austin/clarity/recipes/cad2/task2/baseline/enhance.py", line 279, in enhance
separation_models = load_separation_model(
File "/home/austin/clarity/recipes/cad2/task2/baseline/enhance.py", line 407, in
enhance()
RuntimeError: Error(s) in loading state_dict for ConvTasNetStereo:
size mismatch for encoder.conv1d_U.weight: copying a param with shape torch.Size([256, 2, 20]) from checkpoint, the shape in current model is torch.Size([256, 1, 20]).
size mismatch for separator.network.3.weight: copying a param with shape torch.Size([512, 256, 1]) from checkpoint, the shape in current model is torch.Size([1024, 256, 1]).
size mismatch for decoder.basis_signals.weight: copying a param with shape torch.Size([40, 256]) from checkpoint, the shape in current model is torch.Size([20, 256]).
Environment
Please include the following...
[ ] OS: Ubuntu 22.04.2 LTS (GNU/Linux 5.15.153.1-microsoft-standard-WSL2 x86_64)
[ ] Python version: 3.8.19
[ ] clarity version: v0.6.0
[ ] Installed package versions:
The text was updated successfully, but these errors were encountered: