Incorrect command for model quantization in README.md #1199

jayyaali95 · 2023-04-26T22:47:07Z

The correct command for quantizing the model is not reflected accurately in lines 206 and 207 of README.md (Prepare Data & Run). To perform model quantization, you should use the following command instead.

# quantize the model to 4-bits (using method 2 = q4_0)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2

The text was updated successfully, but these errors were encountered:

slaren · 2023-04-26T22:49:20Z

If this isn't working for you, you need to update to current master, check #1191 that was merged today.

jayyaali95 · 2023-04-26T23:03:11Z

Thanks @slaren for mentioning that!

slaren closed this as not planned Apr 26, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect command for model quantization in README.md #1199

Incorrect command for model quantization in README.md #1199

jayyaali95 commented Apr 26, 2023

slaren commented Apr 26, 2023

jayyaali95 commented Apr 26, 2023

Incorrect command for model quantization in README.md #1199

Incorrect command for model quantization in README.md #1199

Comments

jayyaali95 commented Apr 26, 2023

slaren commented Apr 26, 2023

jayyaali95 commented Apr 26, 2023