Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Need help to understand q4_0, q4_1, q4_2, q4_3 quantization #1114

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
santapo opened this issue Apr 22, 2023 · 1 comment
Closed

Need help to understand q4_0, q4_1, q4_2, q4_3 quantization #1114

santapo opened this issue Apr 22, 2023 · 1 comment

Comments

@santapo
Copy link

santapo commented Apr 22, 2023

Is there any source that provides the detail of these q4_0, q4_1, q4_2, q4_3 method? I tried to read the C++ code but it's hard for me to understand how they work and difference between them.

@Folko-Ven
Copy link
Contributor

Hi. You can see more about the different types of quantization here - #406. But in short, q4_0 - worse accuracy but higher speed, q4_1 - more accurate but slower. q4_2 and q4_3 are like new generations of q4_0 and q4_1. q4_2 should be more accurate q4_0 and just as fast, and q4_3 should be similarly more accurate than q4_1.

@ggml-org ggml-org locked and limited conversation to collaborators Apr 22, 2023
@sw sw converted this issue into discussion #1121 Apr 22, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants