-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Cannot load Bloom-7b1 ggml model in GPU #3697
Comments
I am not able to load llava model ggml-model-q4_k.gguf, mmproj mmproj-model-f16.gguf, into GPU too. (main) llama v2 is working fine. clip_model_load: text_encoder: 0 prompt: 'describe the image'` |
is there any solution for this? I found that models with alibi all seem to have this issue on nvidia gpu. it can run successfully on metal. |
As a temporary workaround, you can add |
Fixed in #3921 |
I used the
convert-bloom-hf-to-gguf.py
file to convert the Huggingfacebigscience/bloom-7b1
to a ggml model withf16
successfully:This gives me a model
ggml-model-f16.gguf
that correctly loads and run in CPU. However, when I try to offload a layer on the GPU, I get the following error:Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
Linux nemo 5.4.0-165-generic #182-Ubuntu SMP Mon Oct 2 19:43:28 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
convert-bloom-hf-to-gguf.py
to convert to f16 ggml.Failure Logs
The text was updated successfully, but these errors were encountered: