Skip to content

[Bug] dequantize_row_q4_0 segfaults #791

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
sha0coder opened this issue Apr 5, 2023 · 5 comments
Closed

[Bug] dequantize_row_q4_0 segfaults #791

sha0coder opened this issue Apr 5, 2023 · 5 comments

Comments

@sha0coder
Copy link

Environment and Context

Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
g++ (Debian 10.2.1-6) 10.2.1 20210110
GNU Make 4.3

Failure Information (for bugs)

main segfaults at dequantize_row_q4_0+48

Steps to Reproduce

./main -m models/ggml-vocab-q4_0.bin

~/s/llama.cpp ❯❯❯ gdb main
(gdb) r -m models/ggml-vocab-q4_0.bin
Starting program: /home/sha0/soft/llama.cpp/main -m models/ggml-vocab-q4_0.bin
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main: seed = 1680724006
llama_model_load: loading model from 'models/ggml-vocab-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 0.41 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 1792.49 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'models/ggml-vocab-q4_0.bin'
llama_model_load: model size = 0.00 MB / num tensors = 0
llama_model_load: WARN no tensors loaded from model file - assuming empty model for testing
llama_init_from_file: kv self size = 256.00 MB

system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0

[New Thread 0x7fff77560700 (LWP 142639)]
[New Thread 0x7fff76d5f700 (LWP 142640)]
[New Thread 0x7fff7655e700 (LWP 142641)]
[New Thread 0x7fff75d5d700 (LWP 142642)]
[New Thread 0x7fff7555c700 (LWP 142643)]
[New Thread 0x7fff74d5b700 (LWP 142644)]
[New Thread 0x7fff7455a700 (LWP 142645)]
[New Thread 0x7fff73d59700 (LWP 142646)]
[New Thread 0x7fff73558700 (LWP 142647)]
[New Thread 0x7fff72d57700 (LWP 142648)]
[New Thread 0x7fff72556700 (LWP 142649)]
[New Thread 0x7fff71d55700 (LWP 142650)]
[New Thread 0x7fff71554700 (LWP 142651)]
[New Thread 0x7fff70d53700 (LWP 142652)]
[New Thread 0x7fff70552700 (LWP 142653)]

Thread 1 "main" received signal SIGSEGV, Segmentation fault.
0x000055555555e430 in dequantize_row_q4_0 ()
(gdb) bt
#0 0x000055555555e430 in dequantize_row_q4_0 ()
#1 0x0000555555567585 in ggml_compute_forward_get_rows ()
#2 0x000055555556fba3 in ggml_graph_compute ()
#3 0x0000555555578eca in llama_eval_internal(llama_context&, int const*, int, int, int) ()
#4 0x000055555557919f in llama_eval ()
#5 0x000055555555c1aa in main ()
(gdb) x/i $pc
=> 0x55555555e430 <dequantize_row_q4_0+48>: vpmovzxbw 0x4(%rdi),%ymm1
(gdb) i r rdi
rdi 0xa00 2560
(gdb) i r ymm1
ymm1 {v16_bfloat16 = {0x180, 0x0, 0x0, 0x0, 0x180, 0x0 <repeats 11 times>}, v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xc0, 0x43, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0, 0x43, 0x0 <repeats 22 times>}, v16_int16 = {0x43c0, 0x0, 0x0, 0x0, 0x43c0, 0x0 <repeats 11 times>}, v8_int32 = {0x43c0, 0x0, 0x43c0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x43c0, 0x43c0, 0x0, 0x0}, v2_int128 = {0x43c000000000000043c0, 0x0}}
(gdb)

@sha0coder
Copy link
Author

Thread 1 "main" received signal SIGSEGV, Segmentation fault.
dequantize_row_q4_0 (vx=, y=, k=k@entry=4096) at ggml.c:987
987 __m256i vx8 = bytesFromNibbles(pp+l/2);

and without AVX2 the crash is here:
Thread 1 "main" received signal SIGSEGV, Segmentation fault.
dequantize_row_q4_0 (vx=, y=, k=k@entry=4096) at ggml.c:1067
1067 const float d = x[i].d;

image

@sha0coder
Copy link
Author

The vx pointer has wrong value:

image

@slaren
Copy link
Member

slaren commented Apr 5, 2023

llama_model_load: WARN no tensors loaded from model file - assuming empty model for testing

You cannot eval with a vocab only model.

@sha0coder
Copy link
Author

where can I get a proper model?

@slaren
Copy link
Member

slaren commented Apr 5, 2023

I cannot help you with that, but there are some details in the official repository: https://github.com/facebookresearch/llama/

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants