runtime error in example/server #1557

DDANGEUN · 2023-05-22T04:20:31Z

To build and run the just released example/server executable,
I made the server executable with cmake build(adding option: -DLLAMA_BUILD_SERVER=ON),

And I followed the ReadMe.md and ran the following code.

./build/bin/server -m models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin --ctx_size 2048

And the following error occurred.

In Mac

main: seed = 1684723159
llama.cpp: loading model from models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin
libc++abi: terminating due to uncaught exception of type std::runtime_error: unexpectedly reached end of file
zsh: abort      ./build/bin/server -m models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin

In Ubuntu(with cuBLAS)

main: seed = 1684728245
llama.cpp: loading model from models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

Same Runtime Error.
what more do I need?

The text was updated successfully, but these errors were encountered:

vicwer · 2023-05-22T09:11:59Z

same error in gpt2_ggml_model when run ./quantize ./gpt2_13b/ggml-model-f16.bin ./gpt2_13b/ggml-model-f16.bin:

terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

mconsidine · 2023-05-22T13:04:07Z

Same error with Linux Mint (Ubuntu) with ggml-alpaca-7b-native-q4.bin

adamierymenko · 2023-05-22T14:00:24Z

Same here on several models, but gpt4-alpaca-lora still works.

Jason0214 · 2023-05-22T15:28:26Z

same error in gpt2_ggml_model when run ./quantize ./gpt2_13b/ggml-model-f16.bin ./gpt2_13b/ggml-model-f16.bin:
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

I hit a similar issue. Mine was caused by 2d5db48. The delta field in each quantize block is changed from fp32 to fp16, so the model file fails to load. There is a file version bump and checking the file version, but the checking was too late. When "some" tensor data is loaded with incorrect size, following length related fields such as name_len may load corrupted data, causing unexpected end of file.

mconsidine · 2023-05-22T16:15:35Z

Thanks for that. I overlooked the breaking change. Looks like from this
#1405 (comment)
there are redone models available.

FSSRepo · 2023-05-22T17:27:06Z

Hello, I am in charge of implementing the server example in llama.cpp. That error occurs because you need to re-quantize the model when using the latest version of the project.

x4080 · 2023-05-22T21:11:23Z

Hi guys, I confirmed its working with the latest model like airoboros-13B.q5_1.bin

@FSSRepo there's some config that is not available in runtime like
-t 6
-n 2048
--repeat_penalty 1.0
-f prompts/chat.txt
-r "User:"

So it will be initialized using the node call in example ?

const axios = require("axios");

const prompt = `Building a website can be done in 10 simple steps:`;

async function Test() {
    let result = await axios.post("http://127.0.0.1:8080/completion", {
        prompt,
        batch_size: 128,
        n_predict: 512,
    });

    // the response is received until completion finish
    console.log(result.data.content);
}

Test();

And thanks again for your hard work @FSSRepo

DDANGEUN · 2023-05-23T02:47:01Z

Hello, I am in charge of implementing the server example in llama.cpp. That error occurs because you need to re-quantize the model when using the latest version of the project.

I knew I needed to re-quantize, It works. Thank you.

mconsidine · 2023-05-23T09:51:20Z

Can you describe or point me to a link that shows how you re-quantized? Thank you. Sent from allegedly smart phone

…

On Mon, May 22, 2023, 10:47 PM DDANGEUN ***@***.***> wrote: Hello, I am in charge of implementing the server example in llama.cpp. That error occurs because you need to re-quantize the model when using the latest version of the project. I knew I needed to re-quantize, It works. Thank you. — Reply to this email directly, view it on GitHub <#1557 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAEPIPG6ROH4DAUUV7LY3DTXHQQK7ANCNFSM6AAAAAAYJ3WSHI> . You are receiving this because you commented.Message ID: <ggerganov/llama. ***@***.***>

x4080 · 2023-05-23T20:34:10Z

@mconsidine just download the new version of the model in hf

DDANGEUN · 2023-05-24T01:58:16Z

@mconsidine I don't know what you are missing, write all steps.

git pull origin master
make clean
make quantize

and form README.md :

# convert the 7B model to ggml FP16 format
python3 convert.py models/7B/

# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

just re-quantize your model

DDANGEUN · 2023-05-24T02:06:08Z

May I close this issue?

mconsidine · 2023-05-24T10:48:45Z

Thank you. I'm squared away...mconsidineOn May 23, 2023 9:58 PM, DDANGEUN ***@***.***> wrote: @mconsidine I don't know what you are missing, write all steps. git pull origin master make clean make quantize and form README.md : # convert the 7B model to ggml FP16 format python3 convert.py models/7B/ # quantize the model to 4-bits (using q4_0 method) ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0 just re-quantize your model —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

simonepontz · 2023-05-25T11:09:52Z

I still experience a similar issue, with the following error during quantization

$ ./quantize models/7B/ggml-model-q4_0.bin models/7B/ggml-model-q4_0.bin.quantized 2
main: build = 588 (ac7876a)
main: quantizing 'models/7B/ggml-model-q4_0.bin' to 'models/7B/ggml-model-q4_0.bin.quantized' as q4_0
llama.cpp: loading model from models/7B/ggml-model-q4_0.bin
libc++abi: terminating with uncaught exception of type std::runtime_error: unexpectedly reached end of file
[1]    76573 abort      ./quantize models/7B/ggml-model-q4_0.bin  2

The step has been so far

git pull
make clean
make quantize

python3 convert.py models/7B/  # which create ggml-model-q4_0.bin

./quantize models/7B/ggml-model-q4_0.bin models/7B/ggml-model-q4_0.bin.quantized q4_0

May be where am I downloading the models? I have tried different approaches on the README they all fail the same way

simonepontz · 2023-05-25T11:31:38Z

I was able to solve for gpt4all doing convert + quantization.

python3 convert.py models/gpt4all-7B/gpt4all-lora-quantized.bin --outtype f16

./quantize models/gpt4all-7B/ggml-model-f16.bin models/gpt4all-7B/ggml-model-q4_0.bin q4_0

Which looking at the output of convert, it should have been doing already

DDANGEUN closed this as completed May 25, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime error in example/server #1557

runtime error in example/server #1557

DDANGEUN commented May 22, 2023

vicwer commented May 22, 2023

mconsidine commented May 22, 2023

adamierymenko commented May 22, 2023

Jason0214 commented May 22, 2023

mconsidine commented May 22, 2023

FSSRepo commented May 22, 2023 •

edited

Loading

x4080 commented May 22, 2023

DDANGEUN commented May 23, 2023

mconsidine commented May 23, 2023 via email

x4080 commented May 23, 2023

DDANGEUN commented May 24, 2023

DDANGEUN commented May 24, 2023

mconsidine commented May 24, 2023 via email

simonepontz commented May 25, 2023

simonepontz commented May 25, 2023

runtime error in example/server #1557

runtime error in example/server #1557

Comments

DDANGEUN commented May 22, 2023

In Mac

In Ubuntu(with cuBLAS)

vicwer commented May 22, 2023

mconsidine commented May 22, 2023

adamierymenko commented May 22, 2023

Jason0214 commented May 22, 2023

mconsidine commented May 22, 2023

FSSRepo commented May 22, 2023 • edited Loading

x4080 commented May 22, 2023

DDANGEUN commented May 23, 2023

mconsidine commented May 23, 2023 via email

x4080 commented May 23, 2023

DDANGEUN commented May 24, 2023

DDANGEUN commented May 24, 2023

mconsidine commented May 24, 2023 via email

simonepontz commented May 25, 2023

simonepontz commented May 25, 2023

FSSRepo commented May 22, 2023 •

edited

Loading