Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Vis problem when T2V Training with distributed training #139

Open
lky-ang opened this issue Jul 30, 2024 · 0 comments
Open

Vis problem when T2V Training with distributed training #139

lky-ang opened this issue Jul 30, 2024 · 0 comments

Comments

@lky-ang
Copy link

lky-ang commented Jul 30, 2024

During the distributed training process of the t2v model, my sample cannot be generated, and the following mismatch problem occurs:

Traceback (most recent call last):
File "/VGen/tools/train/train_t2v_enterance.py", line 287, in worker
visual_func.run(visual_kwards=visual_kwards, **input_kwards)
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/VGen/tools/hooks/visual_train_it2v_video.py", line 62, in run
video_data = self.diffusion.ddim_sample_loop(
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/VGen/tools/modules/diffusions/diffusion_ddim.py", line 253, in ddim_sample_loop
xt, _ = self.ddim_sample(xt, t, model, model_kwargs, clamp, percentile, condition_fn, guide_scale, ddim_timesteps, eta)
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/VGen/tools/modules/diffusions/diffusion_ddim.py", line 217, in ddim_sample
_, _, _, x0 = self.p_mean_variance(xt, t, model, model_kwargs, clamp, percentile, guide_scale)
File "/VGen/tools/modules/diffusions/diffusion_ddim.py", line 158, in p_mean_variance
u_out = model(xt, self._scale_timesteps(t), **model_kwargs[1])
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/miniconda3/envs/vgen/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/VGen/tools/modules/unet/unet_t2v.py", line 251, in forward
context = torch.cat([context, y_context], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 32 for tensor number 1 in the list.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant