-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[Bug] dequantize_row_q4_0 segfaults #791
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Thread 1 "main" received signal SIGSEGV, Segmentation fault. and without AVX2 the crash is here: |
You cannot eval with a vocab only model. |
where can I get a proper model? |
I cannot help you with that, but there are some details in the official repository: https://github.com/facebookresearch/llama/ |
Environment and Context
Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
g++ (Debian 10.2.1-6) 10.2.1 20210110
GNU Make 4.3
Failure Information (for bugs)
main segfaults at dequantize_row_q4_0+48
Steps to Reproduce
./main -m models/ggml-vocab-q4_0.bin
~/s/llama.cpp ❯❯❯ gdb main
(gdb) r -m models/ggml-vocab-q4_0.bin
Starting program: /home/sha0/soft/llama.cpp/main -m models/ggml-vocab-q4_0.bin
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main: seed = 1680724006
llama_model_load: loading model from 'models/ggml-vocab-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 0.41 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 1792.49 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'models/ggml-vocab-q4_0.bin'
llama_model_load: model size = 0.00 MB / num tensors = 0
llama_model_load: WARN no tensors loaded from model file - assuming empty model for testing
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0
[New Thread 0x7fff77560700 (LWP 142639)]
[New Thread 0x7fff76d5f700 (LWP 142640)]
[New Thread 0x7fff7655e700 (LWP 142641)]
[New Thread 0x7fff75d5d700 (LWP 142642)]
[New Thread 0x7fff7555c700 (LWP 142643)]
[New Thread 0x7fff74d5b700 (LWP 142644)]
[New Thread 0x7fff7455a700 (LWP 142645)]
[New Thread 0x7fff73d59700 (LWP 142646)]
[New Thread 0x7fff73558700 (LWP 142647)]
[New Thread 0x7fff72d57700 (LWP 142648)]
[New Thread 0x7fff72556700 (LWP 142649)]
[New Thread 0x7fff71d55700 (LWP 142650)]
[New Thread 0x7fff71554700 (LWP 142651)]
[New Thread 0x7fff70d53700 (LWP 142652)]
[New Thread 0x7fff70552700 (LWP 142653)]
Thread 1 "main" received signal SIGSEGV, Segmentation fault.
0x000055555555e430 in dequantize_row_q4_0 ()
(gdb) bt
#0 0x000055555555e430 in dequantize_row_q4_0 ()
#1 0x0000555555567585 in ggml_compute_forward_get_rows ()
#2 0x000055555556fba3 in ggml_graph_compute ()
#3 0x0000555555578eca in llama_eval_internal(llama_context&, int const*, int, int, int) ()
#4 0x000055555557919f in llama_eval ()
#5 0x000055555555c1aa in main ()
(gdb) x/i $pc
=> 0x55555555e430 <dequantize_row_q4_0+48>: vpmovzxbw 0x4(%rdi),%ymm1
(gdb) i r rdi
rdi 0xa00 2560
(gdb) i r ymm1
ymm1 {v16_bfloat16 = {0x180, 0x0, 0x0, 0x0, 0x180, 0x0 <repeats 11 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xc0, 0x43, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x43, 0x0 <repeats 22 times>}, v16_int16 = {0x43c0, 0x0, 0x0, 0x0, 0x43c0, 0x0 <repeats 11 times>}, v8_int32 = {0x43c0, 0x0, 0x43c0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x43c0, 0x43c0, 0x0, 0x0}, v2_int128 = {0x43c000000000000043c0, 0x0}}
(gdb)
The text was updated successfully, but these errors were encountered: