-
Notifications
You must be signed in to change notification settings - Fork 11.4k
runtime error in example/server #1557
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
same error in gpt2_ggml_model when run
|
Same error with Linux Mint (Ubuntu) with ggml-alpaca-7b-native-q4.bin |
Same here on several models, but gpt4-alpaca-lora still works. |
I hit a similar issue. Mine was caused by 2d5db48. The delta field in each quantize block is changed from fp32 to fp16, so the model file fails to load. There is a file version bump and checking the file version, but the checking was too late. When "some" tensor data is loaded with incorrect size, following length related fields such as |
Thanks for that. I overlooked the breaking change. Looks like from this |
Hello, I am in charge of implementing the server example in llama.cpp. That error occurs because you need to re-quantize the model when using the latest version of the project. |
Hi guys, I confirmed its working with the latest model like airoboros-13B.q5_1.bin @FSSRepo there's some config that is not available in runtime like So it will be initialized using the node call in example ?
And thanks again for your hard work @FSSRepo |
I knew I needed to re-quantize, It works. Thank you. |
Can you describe or point me to a link that shows how you re-quantized?
Thank you.
Sent from allegedly smart phone
…On Mon, May 22, 2023, 10:47 PM DDANGEUN ***@***.***> wrote:
Hello, I am in charge of implementing the server example in llama.cpp.
That error occurs because you need to re-quantize the model when using the
latest version of the project.
I knew I needed to re-quantize, It works. Thank you.
—
Reply to this email directly, view it on GitHub
<#1557 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEPIPG6ROH4DAUUV7LY3DTXHQQK7ANCNFSM6AAAAAAYJ3WSHI>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
@mconsidine just download the new version of the model in hf |
@mconsidine I don't know what you are missing, write all steps.
and form README.md :
just re-quantize your model |
May I close this issue? |
Thank you. I'm squared away...mconsidineOn May 23, 2023 9:58 PM, DDANGEUN ***@***.***> wrote:
@mconsidine I don't know what you are missing, write all steps.
git pull origin master
make clean
make quantize
and form README.md :
# convert the 7B model to ggml FP16 format
python3 convert.py models/7B/
# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0
just re-quantize your model
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I still experience a similar issue, with the following error during quantization
The step has been so far
May be where am I downloading the models? I have tried different approaches on the README they all fail the same way |
I was able to solve for gpt4all doing convert + quantization.
Which looking at the output of convert, it should have been doing already |
To build and run the just released example/server executable,
I made the server executable with cmake build(adding option: -DLLAMA_BUILD_SERVER=ON),
And I followed the ReadMe.md and ran the following code.
And the following error occurred.
In Mac
In Ubuntu(with cuBLAS)
Same Runtime Error.
what more do I need?
The text was updated successfully, but these errors were encountered: