Skip to content

CLBlast build failing on q3 model #1725

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
TheTerrasque opened this issue Jun 6, 2023 · 2 comments
Closed

CLBlast build failing on q3 model #1725

TheTerrasque opened this issue Jun 6, 2023 · 2 comments
Labels

Comments

@TheTerrasque
Copy link

TheTerrasque commented Jun 6, 2023

When trying to run wizardlm-30b.ggmlv3.q3_K_M.bin from https://huggingface.co/TheBloke/WizardLM-30B-GGML using CLBlast build, it fails with GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-opencl.cpp:1009: to_fp32_cl != nullptr

PS H:\Files\Downloads\llama-master-2d7bf11-bin-win-clblast-x64> .\main.exe -m C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin -ngl 20
main: build = 631 (2d7bf11)
main: seed  = 1686095068
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 3080'
ggml_opencl: device FP16 support: false
llama.cpp: loading model from C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32001
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 12 (mostly Q3_K - Medium)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size =    0.13 MB
llama_model_load_internal: using OpenCL for GPU acceleration
llama_model_load_internal: mem required  = 12303.88 MB (+ 3124.00 MB per state)
llama_model_load_internal: offloading 20 layers to GPU
llama_model_load_internal: total VRAM used: 4913 MB
..................................
llama_init_from_file: kv self size  =  780.00 MB

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 GGML_ASSERT: D:\a\llama.cpp\llama.cpp\ggml-opencl.cpp:1009: to_fp32_cl != nullptr

PS H:\Files\Downloads\llama-master-2d7bf11-bin-win-clblast-x64> certutil -hashfile C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin SHA256
SHA256 hash of C:\temp\models\wizardlm-30b.ggmlv3.q3_K_M.bin:
65e3770689b388c50bf39406484cd5755854b57d57d802380bedfb4d31a63e8b
CertUtil: -hashfile command completed successfully.

Running without GPU layers works

@arch-btw
Copy link
Contributor

arch-btw commented Jun 7, 2023

Same issue on guanaco-7B.ggmlv3.q4_K_S.bin and guanaco-7B.ggmlv3.q4_K_M.bin

Running without GPU works indeed.

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants