add more benchmark numbers #2900

DerekLiu35 · 2025-06-10T02:09:05Z

sayakpaul

Thanks for the updates. I think for torchao, we could additionally mention the FP8 numbers on RTX 4090. WDYT?

sayakpaul · 2025-06-10T02:15:01Z

diffusers-quantization.md

@@ -450,7 +450,7 @@ For more information check out the [Layerwise casting docs](https://huggingface.

 Most of these quantization backends can be combined with the memory optimization techniques offered in Diffusers. Let's explore CPU offloading, group offloading, and `torch.compile`. You can learn more about these techniques in the [Diffusers documentation](https://huggingface.co/docs/diffusers/main/en/optimization/memory).

-> **Note:** At the time of writing, bnb + `torch.compile` also works if bnb is installed from source and using pytorch nightly or with fullgraph=False.
+> **Note:** At the time of writing, bnb + `torch.compile` works if bnb is installed from source and using pytorch nightly or with fullgraph=False.


https://github.com/bitsandbytes-foundation/bitsandbytes/releases/tag/0.46.0 is done so bitsandbytes==0.46.0 should work with PyTorch nightly.

diffusers-quantization.md

sayakpaul · 2025-06-10T02:18:13Z

diffusers-quantization.md

@@ -565,6 +592,13 @@ pipe = FluxPipeline.from_pretrained(
 | int8_weight_only              | 17.020 GB            | 22.473 GB   | 8 seconds     | ~851 seconds          |
 | float8_weight_only            | 17.016 GB            | 22.115 GB   | 8 seconds     | ~545 seconds          |

+**bitsandbytes + `torch.compile`**: **Note:** To enable compatibility with torch.compile, make sure you're using the latest version of bitsandbytes and PyTorch nightlies (2.8)


Mention the hardware this was obtained on.

Not sure what hardware this was obtained on? was it A100?

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

DerekLiu35 · 2025-06-10T03:01:19Z

Thanks for the updates. I think for torchao, we could additionally mention the FP8 numbers on RTX 4090. WDYT?

Hmm, I'm not sure. RTX 4090 is probably more common for developers, but could make blogpost more complex

sayakpaul · 2025-06-10T03:08:46Z

In the diffusion world, RTX 4090 is more common than A100, H100, etc., actually. But okay without.

add bnb + torch.compile

3698f83

sayakpaul reviewed Jun 10, 2025

View reviewed changes

DerekLiu35 and others added 3 commits June 9, 2025 22:30

Update diffusers-quantization.md

2f3ec52

Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>

apply suggestions from code review

4a54143

fix

b8dd51b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add more benchmark numbers #2900

add more benchmark numbers #2900

Uh oh!

DerekLiu35 commented Jun 10, 2025 •

edited

Loading

Uh oh!

sayakpaul left a comment

Uh oh!

sayakpaul Jun 10, 2025

Uh oh!

Uh oh!

sayakpaul Jun 10, 2025

Uh oh!

DerekLiu35 Jun 10, 2025

Uh oh!

sayakpaul Jun 10, 2025

Uh oh!

DerekLiu35 commented Jun 10, 2025

Uh oh!

sayakpaul commented Jun 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

add more benchmark numbers #2900

Are you sure you want to change the base?

add more benchmark numbers #2900

Uh oh!

Conversation

DerekLiu35 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

DerekLiu35 Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

DerekLiu35 commented Jun 10, 2025

Uh oh!

sayakpaul commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DerekLiu35 commented Jun 10, 2025 •

edited

Loading

sayakpaul commented Jun 10, 2025 •

edited

Loading