Skip to content

[tests] tests for compilation + quantization (bnb) #11672

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Jun 6, 2025

What does this PR do?

Adds tests for

  • quant + compilation
  • quant + compilation + model CPU offloading
  • quant + compilation + group offloading

Does this for bitsandbytes for now.

@sayakpaul sayakpaul added quantization performance Anything related to performance improvements, profiling and benchmarking torch.compile labels Jun 6, 2025
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul requested a review from matthewdouglas June 6, 2025 09:36
@sayakpaul sayakpaul changed the title [wip][tests] start adding tests for compilation + quantization [tests] tests for compilation + quantization (bnb) Jun 7, 2025
@sayakpaul sayakpaul marked this pull request as ready for review June 7, 2025 04:45
@sayakpaul sayakpaul requested review from DN6 and matthewdouglas June 7, 2025 04:45
@sayakpaul
Copy link
Member Author

@DN6 LMK what you think of the test suite. The combinations target consumer GPU where using quantization is beneficial (cc: @asomoza). Would you be able to add this for GGUF, too?

Also cc: @stevhliu I think we should try to document these combos of optims in an easy manner now that we know they work (I can help get latency and memory numbers).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
performance Anything related to performance improvements, profiling and benchmarking quantization torch.compile
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants