Latest release crashes on start #903

softzer0 · 2023-04-12T01:39:44Z

D:\Downloads\llama-master-8b67998-bin-win-avx-x64>main -m ggml-model-q4_1.bin -p "Building a website can be done in 10 simple steps:" -n 512
main: seed = 1681263282
llama.cpp: loading model from ggml-model-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
LLAMA_ASSERT: D:\a\llama.cpp\llama.cpp\llama.cpp:830: false

I'm experiecing this error. Anyone knows what's the issue? I think this is a bug, since the one of the previous releases that doesn't have this problem is master-2663d2c.

The text was updated successfully, but these errors were encountered:

thestamp · 2023-04-12T02:49:34Z

can confirm, i get the exact same error. Rolling back to linked release works

funnbot · 2023-04-12T02:55:54Z

Same issue, introduced in #709
from:

static const char *llama_ftype_name(enum llama_ftype ftype) {
    switch (ftype) {
        case LLAMA_FTYPE_ALL_F32:     return "all F32";
        case LLAMA_FTYPE_MOSTLY_F16:  return "mostly F16";
        case LLAMA_FTYPE_MOSTLY_Q4_0: return "mostly Q4_0";
        case LLAMA_FTYPE_MOSTLY_Q4_1: return "mostly Q4_1";
        default: LLAMA_ASSERT(false);
    }
}

ftype for my q4_1 model is 4 when this function is called.
before that PR its still 4, just called the f16 hparam, so is this just an off by one issue?

This is a gptq model converted to q4_1, and interestingly, the convert-gptq-to-ggml.py script does do fout.write(struct.pack("i", 4)) when writing that...

Ah, so #801 removed checking for GPTQ models
- case 4: wtype = GGML_TYPE_Q4_1; vtype = GGML_TYPE_F16; break;,
since it switched to per layer types, meaning the f16 hparam hasn't mattered since that PR.

For the actual fix, I guess another llama_ftype could be added? LLAMA_FTYPE_MOSTLY_Q4_1_SOME_F16 = 4

temp fix for anyone waiting

static const char *llama_ftype_name(enum llama_ftype ftype) {
   switch (ftype) {
       case LLAMA_FTYPE_ALL_F32:     return "all F32";
       case LLAMA_FTYPE_MOSTLY_F16:  return "mostly F16";
       case LLAMA_FTYPE_MOSTLY_Q4_0: return "mostly Q4_0";
       case LLAMA_FTYPE_MOSTLY_Q4_1: return "mostly Q4_1";
+      case 4: return "mostly Q4_1 and some f16";
       default: LLAMA_ASSERT(false);
   }
}

There is no negative effect from just bypassing this assertion, the f16/ftype hparam isn’t used anymore.

TheBloke · 2023-04-12T07:54:59Z

Yes I am having this issue as well, with GPTQ models

wbpxre150 · 2023-04-12T08:13:26Z

If you comment out default: LLAMA_ASSERT(false); then it loads just fine. I tested this earlier today. Not sure what the consequences of that are, as it only crashed for me loading the Koala model, using gpt4all 7B model it is fine.
Commenting the above line out, let it load and it started generating a response, however it was so slow I gave up testing it and moved onto something else. I think its a RAM limitation on this laptop, it cannot load larger models than 7B.

TheBloke · 2023-04-12T08:15:11Z

For now I just rolled back to the commit before with:

$ git checkout 2663d2c6784ad7b77998c6874df25648d597f74b && make clean && make

sw · 2023-04-12T13:18:37Z

My apologies, I assumed that the "4" format was no longer supported by the new loader code in #801, that's why I didn't make a value in enum llama_ftype for it and removed it from the switch. I'll look into it later today.

LostRuins mentioned this issue Apr 12, 2023

Failed to execute script 'koboldcpp' due to unhandled exception! LostRuins/koboldcpp#42

Closed

sw self-assigned this Apr 12, 2023

sw mentioned this issue Apr 12, 2023

Don't crash on ftype (formerly f16) == 4 #917

Merged

sw closed this as completed in #917 Apr 12, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latest release crashes on start #903

Latest release crashes on start #903

softzer0 commented Apr 12, 2023 •

edited

Loading

thestamp commented Apr 12, 2023

funnbot commented Apr 12, 2023 •

edited

Loading

TheBloke commented Apr 12, 2023

wbpxre150 commented Apr 12, 2023

TheBloke commented Apr 12, 2023

sw commented Apr 12, 2023

Latest release crashes on start #903

Latest release crashes on start #903

Comments

softzer0 commented Apr 12, 2023 • edited Loading

thestamp commented Apr 12, 2023

funnbot commented Apr 12, 2023 • edited Loading

temp fix for anyone waiting

TheBloke commented Apr 12, 2023

wbpxre150 commented Apr 12, 2023

TheBloke commented Apr 12, 2023

sw commented Apr 12, 2023

softzer0 commented Apr 12, 2023 •

edited

Loading

funnbot commented Apr 12, 2023 •

edited

Loading