Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Fix mixed precision fine-tuning for text-to-image-lora-sdxl example. #6751

Merged
merged 2 commits into from
Jan 30, 2024

Conversation

sajadn
Copy link
Contributor

@sajadn sajadn commented Jan 29, 2024

What does this PR do?

Fixes #6442
Part of #6552

Before submitting

To Test?

training:

CUDA_VISIBLE_DEVICES=0 accelerate launch train_text_to_image_lora_sdxl.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \
  --dataset_name="lambdalabs/pokemon-blip-captions" \
  --mixed_precision="fp16" \
  --output_dir="text_to_image_example" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 --gradient_checkpointing \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=8 --checkpointing_steps=2 --checkpoints_total_limit=2 \
  --use_8bit_adam \
  --seed="0" 

resuming:

CUDA_VISIBLE_DEVICES=0 accelerate launch train_text_to_image_lora_sdxl.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" \
  --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix \
  --dataset_name="lambdalabs/pokemon-blip-captions" \
  --mixed_precision="fp16" \
  --output_dir="text_to_image_example" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 --gradient_checkpointing \
  --learning_rate=1e-4 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=8 --checkpointing_steps=2 --checkpoints_total_limit=2 \
  --use_8bit_adam \
  --seed="0" \
  --resume_from_checkpoint="latest"

Who can review?

@sayakpaul

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul
Copy link
Member

sayakpaul commented Jan 29, 2024

Wow. I just ran it and it went smoothly. Thanks much!

text_encoder_2_state_dict, network_alphas=network_alphas, text_encoder=text_encoder_two_
)
_set_state_dict_into_text_encoder(
lora_state_dict, prefix="text_encoder_2.", text_encoder=text_encoder_one_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be text_encoder_two_?

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a single comment.

@sajadn
Copy link
Contributor Author

sajadn commented Jan 29, 2024

You're right! That was a terrible mistake! Review again please.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super!

@sayakpaul sayakpaul merged commit 058b475 into huggingface:main Jan 30, 2024
@loboere
Copy link

loboere commented Feb 6, 2024

I'm still getting the error ValueError: Attempting to unscale FP16 gradients.

Name: diffusers
Version: 0.26.2

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
…uggingface#6751)

* Fix mixed precision fine-tuning for text-to-image-lora-sdxl example.

* fix text_encoder_two bug.

---------

Co-authored-by: Sajad Norouzi <sajadn@dev-dsk-sajadn-2a-87239470.us-west-2.amazon.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError: Attempting to unscale FP16 gradients.
4 participants