Cannot load 2 bit quantized ggml model on Windows #1018

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

#

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Jump to bottom

Closed

dillfrescott opened this issue Apr 17, 2023 · 1 comment

dillfrescott commented Apr 17, 2023

C:\Users\micro\Downloads>main -m ggml-model-q2_0.bin
main: seed = 1681700481
llama.cpp: loading model from ggml-model-q2_0.bin
error loading model: unrecognized tensor type 5

llama_init_from_file: failed to load model
main: error: failed to load model 'ggml-model-q2_0.bin'

The text was updated successfully, but these errors were encountered:

Contributor

rabidcopy commented Apr 17, 2023

#1004 Not merged yet and still a WIP. You will need to use this PR to load models quantized to 2-bit.

prusnak closed this as not planned

Bearsaerker mentioned this issue

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

# for free to join this conversation on GitHub. Already have an account? # to comment