Need help to understand q4_0, q4_1, q4_2, q4_3 quantization #1114

santapo · 2023-04-22T02:12:50Z

Is there any source that provides the detail of these q4_0, q4_1, q4_2, q4_3 method? I tried to read the C++ code but it's hard for me to understand how they work and difference between them.

Folko-Ven · 2023-04-22T04:42:19Z

Hi. You can see more about the different types of quantization here - #406. But in short, q4_0 - worse accuracy but higher speed, q4_1 - more accurate but slower. q4_2 and q4_3 are like new generations of q4_0 and q4_1. q4_2 should be more accurate q4_0 and just as fast, and q4_3 should be similarly more accurate than q4_1.

ggml-org locked and limited conversation to collaborators Apr 22, 2023

sw converted this issue into discussion #1121 Apr 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Need help to understand q4_0, q4_1, q4_2, q4_3 quantization #1114

Need help to understand q4_0, q4_1, q4_2, q4_3 quantization #1114

santapo commented Apr 22, 2023

Folko-Ven commented Apr 22, 2023

This issue was moved to a discussion.

This issue was moved to a discussion.

Need help to understand q4_0, q4_1, q4_2, q4_3 quantization #1114

Need help to understand q4_0, q4_1, q4_2, q4_3 quantization #1114

Comments

santapo commented Apr 22, 2023

Folko-Ven commented Apr 22, 2023

This issue was moved to a discussion.