Can i quantize a 4Bit model more? #762

Shreyas-ITB · 2023-04-04T10:23:08Z

Hi i want to quantize a model which is already quantized to 4bit q4_1 but i want to make it compute faster so i wanted to ask what is the command to quantize the quantized module. I tried once with the command that is in the readme file but that didnt work. so can anyone help me?

The text was updated successfully, but these errors were encountered:

howard0su · 2023-04-04T11:28:09Z

what command did you run? I think you can convert ggml to pth, then pth to ggml with q4_0.

sgoll · 2023-04-04T12:04:15Z

At the moment, I believe only 4-bit quantization has been implemented and is natively supported. You can find discussions about possibly supporting 2-bit quantization here (and 3-bit as a side note):

There is a fork that has implemented 2-bit quantization with some success, see #456 (comment).

prusnak · 2023-04-04T17:39:38Z

The comment above explains the situation. One node: if there is a 2-bit or 3-bit quantization available, it should be always performed from the original f16 or f32 file.

prusnak closed this as completed Apr 4, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can i quantize a 4Bit model more? #762

Can i quantize a 4Bit model more? #762

Shreyas-ITB commented Apr 4, 2023

howard0su commented Apr 4, 2023

sgoll commented Apr 4, 2023

prusnak commented Apr 4, 2023

Can i quantize a 4Bit model more? #762

Can i quantize a 4Bit model more? #762

Comments

Shreyas-ITB commented Apr 4, 2023

howard0su commented Apr 4, 2023

sgoll commented Apr 4, 2023

prusnak commented Apr 4, 2023