Skip to content

runtime error in example/server #1557

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
DDANGEUN opened this issue May 22, 2023 · 15 comments
Closed

runtime error in example/server #1557

DDANGEUN opened this issue May 22, 2023 · 15 comments

Comments

@DDANGEUN
Copy link

To build and run the just released example/server executable,
I made the server executable with cmake build(adding option: -DLLAMA_BUILD_SERVER=ON),

And I followed the ReadMe.md and ran the following code.

./build/bin/server -m models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin --ctx_size 2048

And the following error occurred.

In Mac

main: seed = 1684723159
llama.cpp: loading model from models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin
libc++abi: terminating due to uncaught exception of type std::runtime_error: unexpectedly reached end of file
zsh: abort      ./build/bin/server -m models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin

In Ubuntu(with cuBLAS)

main: seed = 1684728245
llama.cpp: loading model from models/ggml-vicuna-13b-1.1/ggml-vicuna-13b-1.1-q4_1.bin
terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

Same Runtime Error.
what more do I need?

@vicwer
Copy link

vicwer commented May 22, 2023

same error in gpt2_ggml_model when run ./quantize ./gpt2_13b/ggml-model-f16.bin ./gpt2_13b/ggml-model-f16.bin:

terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

@mconsidine
Copy link

Same error with Linux Mint (Ubuntu) with ggml-alpaca-7b-native-q4.bin

@adamierymenko
Copy link

Same here on several models, but gpt4-alpaca-lora still works.

@Jason0214
Copy link

same error in gpt2_ggml_model when run ./quantize ./gpt2_13b/ggml-model-f16.bin ./gpt2_13b/ggml-model-f16.bin:

terminate called after throwing an instance of 'std::runtime_error'
  what():  unexpectedly reached end of file
Aborted (core dumped)

I hit a similar issue. Mine was caused by 2d5db48. The delta field in each quantize block is changed from fp32 to fp16, so the model file fails to load. There is a file version bump and checking the file version, but the checking was too late. When "some" tensor data is loaded with incorrect size, following length related fields such as name_len may load corrupted data, causing unexpected end of file.

@mconsidine
Copy link

Thanks for that. I overlooked the breaking change. Looks like from this
#1405 (comment)
there are redone models available.

@FSSRepo
Copy link
Collaborator

FSSRepo commented May 22, 2023

Hello, I am in charge of implementing the server example in llama.cpp. That error occurs because you need to re-quantize the model when using the latest version of the project.

@x4080
Copy link

x4080 commented May 22, 2023

Hi guys, I confirmed its working with the latest model like airoboros-13B.q5_1.bin

@FSSRepo there's some config that is not available in runtime like
-t 6
-n 2048
--repeat_penalty 1.0
-f prompts/chat.txt
-r "User:"

So it will be initialized using the node call in example ?

const axios = require("axios");

const prompt = `Building a website can be done in 10 simple steps:`;

async function Test() {
    let result = await axios.post("http://127.0.0.1:8080/completion", {
        prompt,
        batch_size: 128,
        n_predict: 512,
    });

    // the response is received until completion finish
    console.log(result.data.content);
}

Test();

And thanks again for your hard work @FSSRepo

@DDANGEUN
Copy link
Author

Hello, I am in charge of implementing the server example in llama.cpp. That error occurs because you need to re-quantize the model when using the latest version of the project.

I knew I needed to re-quantize, It works. Thank you.

@mconsidine
Copy link

mconsidine commented May 23, 2023 via email

@x4080
Copy link

x4080 commented May 23, 2023

@mconsidine just download the new version of the model in hf

@DDANGEUN
Copy link
Author

@mconsidine I don't know what you are missing, write all steps.

git pull origin master
make clean
make quantize

and form README.md :

# convert the 7B model to ggml FP16 format
python3 convert.py models/7B/

# quantize the model to 4-bits (using q4_0 method)
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

just re-quantize your model

@DDANGEUN
Copy link
Author

May I close this issue?

@mconsidine
Copy link

mconsidine commented May 24, 2023 via email

@simonepontz
Copy link

I still experience a similar issue, with the following error during quantization

$ ./quantize models/7B/ggml-model-q4_0.bin models/7B/ggml-model-q4_0.bin.quantized 2
main: build = 588 (ac7876a)
main: quantizing 'models/7B/ggml-model-q4_0.bin' to 'models/7B/ggml-model-q4_0.bin.quantized' as q4_0
llama.cpp: loading model from models/7B/ggml-model-q4_0.bin
libc++abi: terminating with uncaught exception of type std::runtime_error: unexpectedly reached end of file
[1]    76573 abort      ./quantize models/7B/ggml-model-q4_0.bin  2

The step has been so far

git pull
make clean
make quantize

python3 convert.py models/7B/  # which create ggml-model-q4_0.bin

./quantize models/7B/ggml-model-q4_0.bin models/7B/ggml-model-q4_0.bin.quantized q4_0

May be where am I downloading the models? I have tried different approaches on the README they all fail the same way

@simonepontz
Copy link

I was able to solve for gpt4all doing convert + quantization.

python3 convert.py models/gpt4all-7B/gpt4all-lora-quantized.bin --outtype f16

./quantize models/gpt4all-7B/ggml-model-f16.bin models/gpt4all-7B/ggml-model-q4_0.bin q4_0

Which looking at the output of convert, it should have been doing already

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants