Skip to content

Misc. bug: llama-cli crash on ubuntu with GGML-VULKAN=ON #11823

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
gaykawadpk opened this issue Feb 12, 2025 · 2 comments
Closed

Misc. bug: llama-cli crash on ubuntu with GGML-VULKAN=ON #11823

gaykawadpk opened this issue Feb 12, 2025 · 2 comments

Comments

@gaykawadpk
Copy link

Name and Version

./llama-cli --version
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = NVIDIA T1000 8GB (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | matrix cores: none
version: 4534 (955a6c2)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

gdb llama-cli 
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from llama-cli...

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/bin/llama-cli
(No debugging symbols found in llama-cli)                                                                                                                                                                                                                                                                                                                                                     
(gdb) run -m ../../../../models/unsloth-llama3.2-1b-finetune-function-calling-v3.Q4_K_M.gguf 

ggml_vulkan: Found 1 Vulkan devices:                                                                                                                                                                                                                                                                                                                                                          
ggml_vulkan: 0 = NVIDIA T1000 8GB (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | matrix cores: none
[New Thread 0x7ffff6a006c0 (LWP 21554)]
build: 4534 (955a6c2d) with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device Vulkan0 (NVIDIA T1000 8GB) - 8192 MiB free
llama_model_loader: loaded meta data with 30 key-value pairs and 147 tensors from ../../../../models/unsloth-llama3.2-1b-finetune-function-calling-v3.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Llama 3.2 1b Instruct Bnb 4bit
llama_model_loader: - kv   3:                       general.organization str              = Unsloth
llama_model_loader: - kv   4:                           general.finetune str              = instruct-bnb-4bit
llama_model_loader: - kv   5:                           general.basename str              = llama-3.2
llama_model_loader: - kv   6:                         general.size_label str              = 1B
llama_model_loader: - kv   7:                          llama.block_count u32              = 16
llama_model_loader: - kv   8:                       llama.context_length u32              = 131072
llama_model_loader: - kv   9:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv  10:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv  11:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  12:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  13:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  14:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  15:                 llama.attention.key_length u32              = 64
llama_model_loader: - kv  16:               llama.attention.value_length u32              = 64
llama_model_loader: - kv  17:                          general.file_type u32              = 15
llama_model_loader: - kv  18:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv  19:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = llama-bpe
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  25:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  26:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  27:            tokenizer.ggml.padding_token_id u32              = 128004
llama_model_loader: - kv  28:                    tokenizer.chat_template str              = {{- bos_token }}\n{%- if custom_tools ...
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   34 tensors
llama_model_loader: - type q4_K:   96 tensors
llama_model_loader: - type q6_K:   17 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 762.81 MiB (5.18 BPW) 
load: special tokens cache size = 256
load: token to piece cache size = 0.7999 MB
print_info: arch             = llama
print_info: vocab_only       = 0
print_info: n_ctx_train      = 131072
print_info: n_embd           = 2048
print_info: n_layer          = 16
print_info: n_head           = 32
print_info: n_head_kv        = 8
print_info: n_rot            = 64
print_info: n_swa            = 0
print_info: n_embd_head_k    = 64
print_info: n_embd_head_v    = 64
print_info: n_gqa            = 4
print_info: n_embd_k_gqa     = 512
print_info: n_embd_v_gqa     = 512
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: n_ff             = 8192
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 500000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 131072
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 1B
print_info: model params     = 1.24 B
print_info: general.name     = Llama 3.2 1b Instruct Bnb 4bit
print_info: vocab type       = BPE
print_info: n_vocab          = 128256
print_info: n_merges         = 280147
print_info: BOS token        = 128000 '<|begin_of_text|>'
print_info: EOS token        = 128009 '<|eot_id|>'
print_info: EOT token        = 128009 '<|eot_id|>'
print_info: EOM token        = 128008 '<|eom_id|>'
print_info: PAD token        = 128004 '<|finetune_right_pad_id|>'
print_info: LF token         = 128 'Ä'
print_info: EOG token        = 128008 '<|eom_id|>'
print_info: EOG token        = 128009 '<|eot_id|>'
print_info: max token length = 256

Thread 1 "llama-cli" received signal SIGSEGV, Segmentation fault.
Download failed: Invalid argument.  Continuing without source file ./string/../sysdeps/x86_64/multiarch/strlen-evex-base.S.
__strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex-base.S:81
warning: 81	../sysdeps/x86_64/multiarch/strlen-evex-base.S: No such file or directory
(gdb) bt
#0  __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex-base.S:81
#1  0x00007ffff6ca2b0c in ggml_vk_get_device(unsigned long) () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/ggml/src/ggml-vulkan/libggml-vulkan.so
#2  0x00007ffff6ca3d27 in ggml_backend_vk_host_buffer_type () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/ggml/src/ggml-vulkan/libggml-vulkan.so
#3  0x00007ffff7f035d7 in llama_model::load_tensors(llama_model_loader&) () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#4  0x00007ffff7e8c53b in llama_model_load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, llama_model&, llama_model_params&) ()
   from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#5  0x00007ffff7e91d3c in llama_model_load_from_file_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, llama_model_params) ()
   from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#6  0x00007ffff7e9200a in llama_model_load_from_file () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#7  0x00005555555cc2fc in common_init_from_params(common_params&) ()
#8  0x000055555557458c in main ()

Problem description & steps to reproduce

I am trying to run llama-cli on ubuntu 24.04 machine. but with -DGGML_VULKAN=ON it is crashing.
Using vulkan sdk 1.3.296.

Thread 1 "llama-cli" received signal SIGSEGV, Segmentation fault.
Download failed: Invalid argument. Continuing without source file ./string/../sysdeps/x86_64/multiarch/strlen-evex-base.S.
__strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex-base.S:81
warning: 81 ../sysdeps/x86_64/multiarch/strlen-evex-base.S: No such file or directory
(gdb) bt
#0 __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex-base.S:81
#1 0x00007ffff6ca2b0c in ggml_vk_get_device(unsigned long) () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/ggml/src/ggml-vulkan/libggml-vulkan.so
#2 0x00007ffff6ca3d27 in ggml_backend_vk_host_buffer_type () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/ggml/src/ggml-vulkan/libggml-vulkan.so
#3 0x00007ffff7f035d7 in llama_model::load_tensors(llama_model_loader&) () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#4 0x00007ffff7e8c53b in llama_model_load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >&, llama_model&, llama_model_params&) ()
from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#5 0x00007ffff7e91d3c in llama_model_load_from_file_impl(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::allocator<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > >&, llama_model_params) ()
from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#6 0x00007ffff7e9200a in llama_model_load_from_file () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#7 0x00005555555cc2fc in common_init_from_params(common_params&) ()
#8 0x000055555557458c in main ()

Steps:
I have compiled llama-cpp with GGML-VULKAN enabled.
Set up the environment using Vulkan sdk 1.3.296
when I am running llama-cli or any other example like llama-simple-chat, I am observing crash.
Without vulkan it works fine.

First Bad Commit

No response

Relevant log output

Thread 1 "llama-cli" received signal SIGSEGV, Segmentation fault.
Download failed: Invalid argument.  Continuing without source file ./string/../sysdeps/x86_64/multiarch/strlen-evex-base.S.
__strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex-base.S:81
warning: 81	../sysdeps/x86_64/multiarch/strlen-evex-base.S: No such file or directory
(gdb) bt
#0  __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex-base.S:81
#1  0x00007ffff6ca2b0c in ggml_vk_get_device(unsigned long) () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/ggml/src/ggml-vulkan/libggml-vulkan.so
#2  0x00007ffff6ca3d27 in ggml_backend_vk_host_buffer_type () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/ggml/src/ggml-vulkan/libggml-vulkan.so
#3  0x00007ffff7f035d7 in llama_model::load_tensors(llama_model_loader&) () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#4  0x00007ffff7e8c53b in llama_model_load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, llama_model&, llama_model_params&) ()
   from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#5  0x00007ffff7e91d3c in llama_model_load_from_file_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, llama_model_params) ()
   from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#6  0x00007ffff7e9200a in llama_model_load_from_file () from /home/dcaimlpune/ashwini_wp/Chat/externals/llama.cpp/buildx86_vulkan/src/libllama.so
#7  0x00005555555cc2fc in common_init_from_params(common_params&) ()
#8  0x000055555557458c in main ()
@0cc4m
Copy link
Collaborator

0cc4m commented Feb 12, 2025

Can you compile in debug mode (For CMake: -DCMAKE_BUILD_TYPE=Debug)? Then reproduce the crash with gdb again and it should show the specific line that crashed.

@github-actions github-actions bot added the stale label Mar 15, 2025
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants