-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
train_dreambooth_lora_flux.py distributed bugs #9161
Comments
What happens if you unwrap it: if accelerator.unwrap_model(transformer).config.guidance_embeds: |
Any progress on this? The full fine-tune gets OOM on 4x A100/80G and the LoRA results in this error.
This exist in the full fine-tune, there is no |
this works in my case. But I got an another pytorch oom error in log_validation, even using a800 80g. How can I fix this? |
For OOM, see https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md. Did you add @maziyarpanahi Could you elaborate? Isn't
|
Sorry about that, I was looking in the wrong file. It does indeed exist in the lora file as well. |
thanks. I missed this oom guidance before. It helps a lot. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Describe the bug
AttributeError when running model parallel distributed training with accelerate
Reproduction
accelerate config:
Logs
if transformer.config.guidance_embeds: AttributeError: DistributedDataParallel object has no attribute config
System Info
diffusers from source
accelerate==0.33.0
transformers==4.44.1
training on A100s
Who can help?
No response
The text was updated successfully, but these errors were encountered: