CLBlast build failing on q3 model #1725

TheTerrasque · 2023-06-06T23:52:55Z

When trying to run wizardlm-30b.ggmlv3.q3_K_M.bin from https://huggingface.co/TheBloke/WizardLM-30B-GGML using CLBlast build, it fails with GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-opencl.cpp:1009: to_fp32_cl != nullptr

PS H:\Files\Downloads\llama-master-2d7bf11-bin-win-clblast-x64> .\main.exe -m C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin -ngl 20
main: build = 631 (2d7bf11)
main: seed  = 1686095068
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 3080'
ggml_opencl: device FP16 support: false
llama.cpp: loading model from C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 12 (mostly Q3_K - Medium)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 12303.88 MB (+ 3124.00 MB per state)
llama_model_load_internal: offloading 20 layers to GPU
llama_model_load_internal: total VRAM used: 4913 MB
..................................
llama_init_from_file: kv self size  =  780.00 MB

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-opencl.cpp:1009: to_fp32_cl != nullptr

PS H:\Files\Downloads\llama-master-2d7bf11-bin-win-clblast-x64> certutil -hashfile C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin SHA256
SHA256 hash of C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin:
65e3770689b388c50bf39406484cd5755854b57d57d802380bedfb4d31a63e8b
CertUtil: -hashfile command completed successfully.

Running without GPU layers works

The text was updated successfully, but these errors were encountered:

arch-btw · 2023-06-07T10:45:37Z

Same issue on guanaco-7B.ggmlv3.q4_K_S.bin and guanaco-7B.ggmlv3.q4_K_M.bin

Running without GPU works indeed.

github-actions · 2024-04-10T01:07:42Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

DocShotgun mentioned this issue Jun 9, 2023

Crash with --useclblast (GGML_ASSERT: ggml-opencl.cpp:1019: to_fp32_cl != nullptr) LostRuins/koboldcpp#222

Closed

ghost mentioned this issue Jun 11, 2023

[User] CLBlast crash with q2_K model #1804

Closed

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 10, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLBlast build failing on q3 model #1725

CLBlast build failing on q3 model #1725

TheTerrasque commented Jun 6, 2023 •

edited

Loading

arch-btw commented Jun 7, 2023 •

edited

Loading

github-actions bot commented Apr 10, 2024

CLBlast build failing on q3 model #1725

CLBlast build failing on q3 model #1725

Comments

TheTerrasque commented Jun 6, 2023 • edited Loading

arch-btw commented Jun 7, 2023 • edited Loading

github-actions bot commented Apr 10, 2024

TheTerrasque commented Jun 6, 2023 •

edited

Loading

arch-btw commented Jun 7, 2023 •

edited

Loading