[Core] fuse_qkv_projection() to Flux #9185

sayakpaul · 2024-08-15T11:31:48Z

What does this PR do?

Adds fuse_qkv_projection() support Flux.

Will report the performance improvements soon.

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to this comment to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from torchao. We are working on a repository to show the full-blown recipes. It will be made open in a day's time.

HuggingFaceDocBuilderDev · 2024-08-15T11:38:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2024-08-19T16:13:11Z

awesome, but I think we will have to update once the refactor PR is in since I combined the attention processors there #9074

sayakpaul · 2024-08-20T01:08:36Z

100 percent right. I will repurpose once your PR is in :)

sayakpaul · 2024-08-22T02:19:43Z

@yiyixuxu could you give this a look? I face adjusted it accordingly with #9074.

yiyixuxu

PR looks good to me
Can we run a actual test to see the improvement before merge? feel free to merge once that's done

sayakpaul · 2024-08-23T05:22:18Z

Check the PR description:

Batch size 1 (see footnote):

With fusion: 8.456 seconds (memory 25.25 GB)
Without fusion: 11.492 seconds (35.455 GB)

As a reminder, refer to https://github.com/huggingface/diffusers/pull/8829/#issuecomment-2236254834 to understand the scope of when fusion is ideal.

Footnote:

This was run on an A100. For quantization, we use "autoquant" from [torchao](https://github.com/pytorch/ao/). We are working on a repository to show the full-blown recipes. It will be made open in a day's time.

yiyixuxu · 2024-08-23T05:23:28Z

@sayakpaul ahh I missed it! sorry! very nice!

ngaloppo · 2024-10-15T04:12:08Z

@sayakpaul This feature doesn't seem to work together with torchao's quantize_(transformer, int8_weight_only()) quantization. Is that expected? I get an error from torchao:

File "/Users/sysperf/miniforge3/envs/flux/lib/python3.11/site-packages/torchao/utils.py", line 389, in _dispatch__torch_dispatch__
    raise NotImplementedError(f"{cls.__name__} dispatch: attempting to run unimplemented operator/function: {func}")
NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: aten.cat.default

sayakpaul · 2024-10-15T04:49:50Z

Please redirect the issue to https://github.com/sayakpaul/diffusers-torchao

* start fusing flux. * test * finish fusion * fix-copues

sayakpaul added 4 commits August 15, 2024 15:51

start fusing flux.

54c9797

test

b0544a1

finish fusion

709690a

Merge branch 'main' into fuse-flux

b28f949

sayakpaul requested a review from DN6 August 16, 2024 07:13

sayakpaul marked this pull request as ready for review August 16, 2024 07:13

Merge branch 'main' into fuse-flux

0f1b17a

sayakpaul requested a review from yiyixuxu August 18, 2024 03:06

Merge branch 'main' into fuse-flux

24e3455

sayakpaul added 4 commits August 20, 2024 06:38

Merge branch 'main' into fuse-flux

e3bb3f5

Merge branch 'main' into fuse-flux

c0ec9f3

fix

fbf0e71

fix-copues

2bdfdde

resolve conflicts.

9d5bec7

yiyixuxu approved these changes Aug 23, 2024

View reviewed changes

sayakpaul merged commit 2d9ccf3 into main Aug 23, 2024
18 checks passed

sayakpaul deleted the fuse-flux branch August 23, 2024 05:24

Ednaordinary mentioned this pull request Aug 26, 2024

NF4 Flux params in diffusers #9165

Closed

ngaloppo mentioned this pull request Oct 15, 2024

qkv_fuse_projections() fails with torchao quantized Flux2DTransformerModel sayakpaul/diffusers-torchao#37

Closed

sayakpaul added a commit that referenced this pull request Dec 23, 2024

[Core] fuse_qkv_projection() to Flux (#9185)

f24f483

* start fusing flux. * test * finish fusion * fix-copues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] fuse_qkv_projection() to Flux #9185

[Core] fuse_qkv_projection() to Flux #9185

sayakpaul commented Aug 15, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 15, 2024

yiyixuxu commented Aug 19, 2024

sayakpaul commented Aug 20, 2024

sayakpaul commented Aug 22, 2024

yiyixuxu left a comment

sayakpaul commented Aug 23, 2024

yiyixuxu commented Aug 23, 2024

ngaloppo commented Oct 15, 2024 •

edited

Loading

sayakpaul commented Oct 15, 2024

[Core] fuse_qkv_projection() to Flux #9185

[Core] fuse_qkv_projection() to Flux #9185

Conversation

sayakpaul commented Aug 15, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Aug 15, 2024

yiyixuxu commented Aug 19, 2024

sayakpaul commented Aug 20, 2024

sayakpaul commented Aug 22, 2024

yiyixuxu left a comment

Choose a reason for hiding this comment

sayakpaul commented Aug 23, 2024

yiyixuxu commented Aug 23, 2024

ngaloppo commented Oct 15, 2024 • edited Loading

sayakpaul commented Oct 15, 2024

sayakpaul commented Aug 15, 2024 •

edited

Loading

ngaloppo commented Oct 15, 2024 •

edited

Loading