Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

训练代码复现 #187

Open
JWSunny opened this issue Oct 30, 2024 · 4 comments
Open

训练代码复现 #187

JWSunny opened this issue Oct 30, 2024 · 4 comments

Comments

@JWSunny
Copy link

JWSunny commented Oct 30, 2024

感谢分享~
1、参考hallo2和[Moore-AnimateAnyone进行第2阶段代码复现,denoise_unet部分的第一个参数latent是跟2个开源项目一致吗?
(1)self.denoising_unet第1个参数:noisy_latents = train_noise_scheduler.add_noise(latents, noise, timesteps)
(2)权重冻结:
vae.requires_grad_(False)
denoising_unet.requires_grad_(False)
reference_unet.requires_grad_(False)
face_locator.requires_grad_(False)

# Set motion module learnable
for name, module in denoising_unet.named_modules():
    if "motion_modules" in name:
        for params in module.parameters():
            params.requires_grad = True

2、使用跟hallo2相同的原图加噪与denoise_unet预测的值算mse loss,第2批数据后就出现loss为nan的情况;
mse_loss: tensor(0.2483, device='cuda:0', dtype=torch.float16, grad_fn=)
{'global_step:1, train_loss: 0.248291015625'}
mse_loss: tensor(nan, device='cuda:0', dtype=torch.float16, grad_fn=)
{'global_step:2, train_loss: nan'}

3、请问能参考哪个开源进行实现,再次感谢!

@JWSunny
Copy link
Author

JWSunny commented Oct 31, 2024

上述问题解决了,想咨询论文中说的8卡A100显存多大的,8卡跑的batch_size是4吗?目前在进行vae.decode单卡A100-80G总报内存溢出!!

@11898d
Copy link

11898d commented Nov 5, 2024

您好,想请问这个项目的训练代码需要重构吗?

@JWSunny
Copy link
Author

JWSunny commented Nov 5, 2024

您好,想请问这个项目的训练代码需要重构吗?

需要,参考提到的几个开源,重写整个训练框架,提供的只是推理代码!

@bbb-stack
Copy link

需要,参考提到的几个开源,重写整个训练框架,提供的只是推理代码!

训练框架参考animateanyone,数据处理参考Hallo2,是这个意思吗?复现出来的效果如何?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants