Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Core] fuse_qkv_projection() to Flux #9185

Merged
merged 11 commits into from
Aug 23, 2024
Merged

[Core] fuse_qkv_projection() to Flux #9185

merged 11 commits into from
Aug 23, 2024

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Aug 15, 2024

What does this PR do?

Adds fuse_qkv_projection() support Flux.

Will report the performance improvements soon.

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to this comment to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from torchao. We are working on a repository to show the full-blown recipes. It will be made open in a day's time.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul requested a review from DN6 August 16, 2024 07:13
@sayakpaul sayakpaul marked this pull request as ready for review August 16, 2024 07:13
@sayakpaul sayakpaul requested a review from yiyixuxu August 18, 2024 03:06
@yiyixuxu
Copy link
Collaborator

awesome, but I think we will have to update once the refactor PR is in since I combined the attention processors there #9074

@sayakpaul
Copy link
Member Author

100 percent right. I will repurpose once your PR is in :)

@sayakpaul
Copy link
Member Author

@yiyixuxu could you give this a look? I face adjusted it accordingly with #9074.

Copy link
Collaborator

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good to me
Can we run a actual test to see the improvement before merge? feel free to merge once that's done

@sayakpaul
Copy link
Member Author

Check the PR description:

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to https://github.com/huggingface/diffusers/pull/8829/#issuecomment-2236254834 to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from [torchao](https://github.com/pytorch/ao/). We are working on a repository to show the full-blown recipes. It will be made open in a day's time.

@yiyixuxu
Copy link
Collaborator

@sayakpaul ahh I missed it! sorry! very nice!

@sayakpaul sayakpaul merged commit 2d9ccf3 into main Aug 23, 2024
18 checks passed
@sayakpaul sayakpaul deleted the fuse-flux branch August 23, 2024 05:24
@ngaloppo
Copy link

ngaloppo commented Oct 15, 2024

@sayakpaul This feature doesn't seem to work together with torchao's quantize_(transformer, int8_weight_only()) quantization. Is that expected? I get an error from torchao:

File "/Users/sysperf/miniforge3/envs/flux/lib/python3.11/site-packages/torchao/utils.py", line 389, in _dispatch__torch_dispatch__
    raise NotImplementedError(f"{cls.__name__} dispatch: attempting to run unimplemented operator/function: {func}")
NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: aten.cat.default

@sayakpaul
Copy link
Member Author

Please redirect the issue to https://github.com/sayakpaul/diffusers-torchao

sayakpaul added a commit that referenced this pull request Dec 23, 2024
* start fusing flux.

* test

* finish fusion

* fix-copues
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants