Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

train_dreambooth_lora_flux.py distributed bugs #9161

Open
neuron-party opened this issue Aug 12, 2024 · 7 comments
Open

train_dreambooth_lora_flux.py distributed bugs #9161

neuron-party opened this issue Aug 12, 2024 · 7 comments
Labels
bug Something isn't working stale Issues that haven't received updates

Comments

@neuron-party
Copy link
Contributor

Describe the bug

AttributeError when running model parallel distributed training with accelerate

Reproduction

accelerate launch --config_file train_dreambooth_lora_flux.py
--resolution=1024
--mixed_precision=bf16
--pretrained_model_name_or_path=black-forest-labels/FLUX.1-dev
--num_validation_images=8
--validation_epochs=100
--rank=16
--train_batch_size=1
--learning_rate=1e-4
--guidance_scale=3.5
--checkpointing_steps=200
--instance_prompt=xyz
--instance_data_dir=xyz
--output_dir=xyz
--logging_dir=xyz
--validation_prompt=xyz

accelerate config:

compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
fsdp_config: {}
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 2
use_cpu: false
gpu_ids: '0, 1'
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false

Logs

if transformer.config.guidance_embeds:

AttributeError: DistributedDataParallel object has no attribute config

System Info

diffusers from source
accelerate==0.33.0
transformers==4.44.1

training on A100s

Who can help?

No response

@neuron-party neuron-party added the bug Something isn't working label Aug 12, 2024
@tolgacangoz
Copy link
Contributor

What happens if you unwrap it:

if accelerator.unwrap_model(transformer).config.guidance_embeds:

@maziyarpanahi
Copy link

Any progress on this? The full fine-tune gets OOM on 4x A100/80G and the LoRA results in this error.

What happens if you unwrap it:

if accelerator.unwrap_model(transformer).config.guidance_embeds:

This exist in the full fine-tune, there is no config.guidance_embeds inside train_dreambooth_lora_flux.py file.

@Adenialzz
Copy link
Contributor

What happens if you unwrap it:

if accelerator.unwrap_model(transformer).config.guidance_embeds:

this works in my case.

But I got an another pytorch oom error in log_validation, even using a800 80g. How can I fix this?

@tolgacangoz
Copy link
Contributor

tolgacangoz commented Aug 20, 2024

For OOM, see https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md. Did you add --gradient_checkpointing, --use_8bit_adam, and --gradient_accumulation_steps=4 or 8? Is it possible for you to try without --validation_prompt? Also, could you try to use one of the latest versions of PyTorch?

@maziyarpanahi Could you elaborate? Isn't config.guidance_embeds in train_dreambooth_lora_flux.py:

if transformer.config.guidance_embeds:

@maziyarpanahi
Copy link

@maziyarpanahi Could you elaborate? Isn't config.guidance_embeds in train_dreambooth_lora_flux.py:

Sorry about that, I was looking in the wrong file. It does indeed exist in the lora file as well.

@Adenialzz
Copy link
Contributor

For OOM, see https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_flux.md. Did you add --gradient_checkpointing, --use_8bit_adam, and --gradient_accumulation_steps=4 or 8? Is it possible for you to try without --validation_prompt? Also, could you try to use one of the latest versions of PyTorch?

@maziyarpanahi Could you elaborate? Isn't config.guidance_embeds in train_dreambooth_lora_flux.py:

if transformer.config.guidance_embeds:

thanks. I missed this oom guidance before. It helps a lot.

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Sep 15, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working stale Issues that haven't received updates
Projects
None yet
Development

No branches or pull requests

4 participants