Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Feature Request] SpQR quantisation #1802

Closed
nivibilla opened this issue Jun 11, 2023 · 2 comments
Closed

[Feature Request] SpQR quantisation #1802

nivibilla opened this issue Jun 11, 2023 · 2 comments

Comments

@nivibilla
Copy link
Contributor

Hey, wanted to know if we could possibly integrate SpQR quantisation. According to this paper here https://arxiv.org/abs/2306.03078.

SpQR works by identifying and isolating outlier weights, which cause particularly-large quantization errors, and storing them in higher precision, while compressing all other weights to 3-4 bits, and achieves relative accuracy losses of less than 1% in perplexity for highly-accurate LLaMA and Falcon LLMs. This makes it possible to run 33B parameter LLM on a single 24 GB consumer GPU without any performance degradation at 15% speedup thus making powerful LLMs available to consumer without any downsides

This would keep all the benefits of quantising while not losing performance.
Apologies if it has already been implemented or is being already. Please point me to the PR if it is. This repo gets work done so fast!

@nivibilla nivibilla changed the title [User] Insert summary of your issue or enhancement.. [Feature Request] SpQR quantisation Jun 11, 2023
@cmp-nct
Copy link
Contributor

cmp-nct commented Jun 12, 2023

Related to this: #1256

I also think that the best way for llms to run economically is to fit them on the GPU at hand (or withhin X % when using partial GPU) with dynamically adjusted precision.
The most important weights receive most precision, as much as possible within the RAM constraints.

@Green-Sky
Copy link
Collaborator

closing in favor of #2061

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants