Skip to content

Can i quantize a 4Bit model more? #762

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Shreyas-ITB opened this issue Apr 4, 2023 · 3 comments
Closed

Can i quantize a 4Bit model more? #762

Shreyas-ITB opened this issue Apr 4, 2023 · 3 comments

Comments

@Shreyas-ITB
Copy link

Hi i want to quantize a model which is already quantized to 4bit q4_1 but i want to make it compute faster so i wanted to ask what is the command to quantize the quantized module. I tried once with the command that is in the readme file but that didnt work. so can anyone help me?

@howard0su
Copy link
Collaborator

what command did you run? I think you can convert ggml to pth, then pth to ggml with q4_0.

@sgoll
Copy link

sgoll commented Apr 4, 2023

At the moment, I believe only 4-bit quantization has been implemented and is natively supported. You can find discussions about possibly supporting 2-bit quantization here (and 3-bit as a side note):

There is a fork that has implemented 2-bit quantization with some success, see #456 (comment).

@prusnak
Copy link
Collaborator

prusnak commented Apr 4, 2023

The comment above explains the situation. One node: if there is a 2-bit or 3-bit quantization available, it should be always performed from the original f16 or f32 file.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants