iq3_xxs: guards for the no-imatrix situation #5334

ikawrakow · 2024-02-05T08:41:05Z

IQ3_XXS can give a very bad quantization when used without an importance matrix (imatrix), see #5332.

Instead of adding a warning or even disallowing IQ3_XXS quantization without an imatrix, this PR prevents a bad outcome by using Q3_K for the attn_v tensors, and a mix of Q4_K and Q3_K for the ffn_down tensors when no imatrix has been supplied. This results in a somewhat larger quantized model (e.g., 2.61 GiB vs 2.5 GiB for 7B LLaMAs) but a more reasonable PPL (e.g., 5.4923 for LLaMA-v2-7B and a context of 4096 vs 100+).

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

iq3_xxs: quards for the no-imatrix situation

7278b0e

ggerganov approved these changes Feb 5, 2024

View reviewed changes

ggerganov changed the title ~~iq3_xxs: quards for the no-imatrix situation~~ iq3_xxs: guards for the no-imatrix situation Feb 5, 2024

ikawrakow merged commit 89503dc into master Feb 5, 2024

ikawrakow deleted the ik/iq3xxs_noimatrix_guard branch February 5, 2024 10:32

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024

iq3_xxs: quards for the no-imatrix situation (ggml-org#5334)

090b291

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024

iq3_xxs: quards for the no-imatrix situation (ggml-org#5334)

1316ca1

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iq3_xxs: guards for the no-imatrix situation #5334

iq3_xxs: guards for the no-imatrix situation #5334

ikawrakow commented Feb 5, 2024

iq3_xxs: guards for the no-imatrix situation #5334

iq3_xxs: guards for the no-imatrix situation #5334

Conversation

ikawrakow commented Feb 5, 2024