Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[ Docs ] Conceptual Guides #18

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Conversation

robertgshaw2-neuralmagic
Copy link
Collaborator

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic commented Jul 8, 2024

SUMMARY:

  • explanation of why quantization is useful
  • explanation of various quantization schemes
  • benchmarking utilities

TEST PLAN:

  • N/a

@robertgshaw2-neuralmagic robertgshaw2-neuralmagic changed the title Rs/concepts [ Docs ] Conceptual Guides - Inference Acceleration from Quantization Jul 8, 2024
@robertgshaw2-neuralmagic robertgshaw2-neuralmagic changed the title [ Docs ] Conceptual Guides - Inference Acceleration from Quantization [ Docs ] Conceptual Guides Jul 8, 2024

## Theory

Performing quantization to go from `float16` to `int8` (or lower) is tricky. Only 256 values can be represented in `int8`, while `float16` can represent a very wide range of values. The idea is to find the best way to project our range [a, b] of `float32` values to the `int8` space.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good space for a diagram (bucketing the weight distribution to 256 buckets)

docs/conceptual_guides/quantization_schemes.md Outdated Show resolved Hide resolved
docs/conceptual_guides/quantization_schemes.md Outdated Show resolved Hide resolved
docs/conceptual_guides/quantization_schemes.md Outdated Show resolved Hide resolved
docs/conceptual_guides/quantization_schemes.md Outdated Show resolved Hide resolved
docs/conceptual_guides/quantization_schemes.md Outdated Show resolved Hide resolved

* **Static quantization**: the range for each activation is computed in advance at quantization-time, typically by passing representative "calibration" data through the model and recording the activation values. In practice, we run a number of forward passes on a calibration dataset is done and compute the ranges according to the observed calibration data.

In general, it is best practice to start your experiments with:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it best practice?

docs/conceptual_guides/quantization_schemes.md Outdated Show resolved Hide resolved
robertgshaw2-neuralmagic and others added 5 commits July 15, 2024 17:55
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024
* test forward (vllm-project#16)

* test frozen (vllm-project#17)

* test frozen

* rename

* lifecycle conftest (vllm-project#21)

* test initalize (vllm-project#18)

* test initalize

* newline

* parametrize weights and inp_act

* remove dup

* test lifecycle (vllm-project#19)

* test lifecycle

* comments

* comments

* add quantization test

* Lifecycle/min max obs (vllm-project#20)

* min max test

* add minmax obs

* test scale range and min_max update

* rebase

* rebase

* fix

* fix
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants