about precision loss #52

sinoaidi · 2024-09-26T01:10:25Z

Compared with llama.cpp, does tmac lose precision when running quantized models, or does it give the same results? I am running qwen1.5 4bit（https://huggingface.co/Qwen/Qwen1.5-4B-Chat-GPTQ-Int4） now, and I found that the answers given by the model are sometimes wrong, especially in English. like this:

BarfingLemurs · 2024-09-26T01:33:02Z

Yes, but their GPTQ has their groupsize 128
https://huggingface.co/Qwen/Qwen1.5-4B-Chat-GPTQ-Int4/blob/ff03f8a9647d68587c4bc621eeafd61c9df4487b/config.json#L29

While the understanding is, groupsize of 32 would be better.

From the source: ggml-org/llama.cpp#1684

In the existing ggml quantization types we have "type-0" (Q4_0, Q5_0) and "type-1" (Q4_1, Q5_1). In "type-0", weights w are obtained from quants q using w = d * q, where d is the block scale. In "type-1", weights are given by w = d * q + m, where m is the block minimum.

Q4_K_M carries a mixture of these complex components:

GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw

kaleid-liner · 2024-09-26T08:18:16Z

There are several ways to increase the model performance of GPTQ, including using w4g64 instead of w4g128, or doing QAT such as EfficientQAT. Another cause is the quality of prompt engineering. Some models, without correct prompt, can output random results.

From our experience, qwen2 GPTQ w4g128 already performs well enough. However, we are still working on merging latest llama.cpp to support qwen2. Track the process through #46 .

sinoaidi · 2024-09-27T09:26:37Z

@kaleid-liner @BarfingLemurs thank you very much.

kaleid-liner added the question Further information is requested label Sep 26, 2024

kaleid-liner closed this as completed Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about precision loss #52

about precision loss #52

sinoaidi commented Sep 26, 2024

BarfingLemurs commented Sep 26, 2024

kaleid-liner commented Sep 26, 2024

sinoaidi commented Sep 27, 2024

about precision loss #52

about precision loss #52

Comments

sinoaidi commented Sep 26, 2024

BarfingLemurs commented Sep 26, 2024

kaleid-liner commented Sep 26, 2024

sinoaidi commented Sep 27, 2024