Skip to content

Error: Invalid model file when using converted GPT4ALL model after following provided instructions #655

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
gaceladri opened this issue Mar 31, 2023 · 11 comments

Comments

@gaceladri
Copy link

Hello,

I have followed the instructions provided for using the GPT-4ALL model. I used the convert-gpt4all-to-ggml.py script to convert the gpt4all-lora-quantized.bin model, as instructed. However, I encountered an error related to an invalid model file when running the example.

Here are the steps I followed, as described in the instructions:

  1. Convert the model using the convert-gpt4all-to-ggml.py script:
python3 convert-gpt4all-to-ggml.py models/gpt4all/gpt4all-lora-quantized.bin ./models/tokenizer.model
  1. Run the interactive mode example with the newly generated gpt4all-lora-quantized.bin model:
./main -m ./models/gpt4all/gpt4all-lora-quantized.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

However, I encountered the following error:

./models/gpt4all/gpt4all-lora-quantized.bin: invalid model file (bad magic [got 0x67676d66 want 0x67676a74])
you most likely need to regenerate your ggml files
the benefit is you'll get 10-100x faster load times
see https://github.com/ggerganov/llama.cpp/issues/91
use convert-pth-to-ggml.py to regenerate from original pth
use migrate-ggml-2023-03-30-pr613.py if you deleted originals
llama_init_from_file: failed to load model
main: error: failed to load model './models/gpt4all/gpt4all-lora-quantized.bin'

Please let me know how to resolve this issue and correctly convert and use the GPT-4ALL model with the interactive mode example.

Thank you.

@gaceladri gaceladri changed the title Error: Invalid model file when using converted GPT-4ALL model after following provided instructions Error: Invalid model file when using converted GPT4ALL model after following provided instructions Mar 31, 2023
@gaceladri
Copy link
Author

I could run it with the previous version https://github.com/ggerganov/llama.cpp/tree/master-ed3c680

@DonIsaac
Copy link

I could run it with the previous version https://github.com/ggerganov/llama.cpp/tree/master-ed3c680

After building from this tag, I'm getting a segfault. What OS are you using?

  • Using Macos 13.2 on an M1 chip
  • commit: ed3c680bcd0e8ce6e574573ba95880b694449878
  • output after running ./main -m g4a/gpt4all-lora-quantized.bin -p "hi there" -n 512:
main: seed = 1680284326
llama_model_load: loading model from 'g4a/gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: n_vocab = 32001
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml ctx size = 4273.35 MB
llama_model_load: mem required  = 6065.35 MB (+ 1026.00 MB per state)
llama_model_load: loading model part 1/1 from 'g4a/gpt4all-lora-quantized.bin'
llama_model_load: [1]    28303 segmentation fault  ./main -m g4a/gpt4all-lora-quantized.bin -p "hi there" -n 512

@rabidcopy
Copy link
Contributor

use migrate-ggml-2023-03-30-pr613.py

@gaceladri
Copy link
Author

I solved the issue by running the command:

python migrate-ggml-2023-03-30-pr613.py models/gpt4all/gpt4all-lora-quantized.bin models/gpt4all/gpt4all-lora-converted.bin

after executing the:

python3 convert-gpt4all-to-ggml.py models/gpt4all-lora-quantized.bin ./models/tokenizer.model

and now i'm interacting with gpt4all with:

./main -m ./models/gpt4all/gpt4all-lora-converted.bin -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

@scottjmaddox
Copy link

Would it be worth updating the README section with this information?

@ROBOKiTTY
Copy link

ROBOKiTTY commented Apr 1, 2023

After running convert-gpt4all-to-ggml.py and migrate-ggml-2023-03-30-pr613.py, main segfaults with a failed ggml assertion.

1GGML_ASSERT: H:\llama.cpp\ggml.c:3192: ((uintptr_t) (result->data))%GGML_MEM_ALIGN == 0

Full logs:

H:\llama.cpp\bin>main -m models/gpt4all-lora-quantized-v2.bin -n 248
main: seed = 1680331950
llama_model_load: loading model from 'models/gpt4all-lora-quantized-v2.bin' - please wait ...
llama_model_load: n_vocab = 32001
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'models/gpt4all-lora-quantized-v2.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 248, n_keep = 0


 5GGML_ASSERT: H:\llama.cpp\ggml.c:3192: ((uintptr_t) (result->data))%GGML_MEM_ALIGN == 0

@BoQsc
Copy link

BoQsc commented Apr 1, 2023

These are all steps that I did:

  1. Torrent download gpt4all-lora-quantized.bin from https://github.com/nomic-ai/gpt4all#try-it-yourself

  2. python -m pip install torch numpy sentencepiece

  3. https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model

python convert-gpt4all-to-ggml.py ./models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model 

python migrate-ggml-2023-03-30-pr613.py models/gpt4all/gpt4all-lora-quantized.bin models/gpt4all/gpt4all-lora-converted.bin.orig

main -m ./llama.cpp/models/gpt4all/gpt4all-lora-converted.bin.orig -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt


However it is writing nonsenses and does not let to interact with interactive mode. Maybe something is wrong.

@ROBOKiTTY
Copy link

After running convert-gpt4all-to-ggml.py and migrate-ggml-2023-03-30-pr613.py, main segfaults with a failed ggml assertion.

1GGML_ASSERT: H:\llama.cpp\ggml.c:3192: ((uintptr_t) (result->data))%GGML_MEM_ALIGN == 0

I commented out this line in ggml.c and recompiled to see what would happen, and it just worked. That was unexpected, but I won't complain.

@clxyder
Copy link

clxyder commented Apr 2, 2023

These are all steps that I did:

  1. Torrent download gpt4all-lora-quantized.bin from https://github.com/nomic-ai/gpt4all#try-it-yourself
  2. python -m pip install torch numpy sentencepiece
  3. https://huggingface.co/decapoda-research/llama-7b-hf/blob/main/tokenizer.model
python convert-gpt4all-to-ggml.py ./models/gpt4all-7B/gpt4all-lora-quantized.bin ./models/tokenizer.model 

python migrate-ggml-2023-03-30-pr613.py models/gpt4all/gpt4all-lora-quantized.bin models/gpt4all/gpt4all-lora-converted.bin.orig

main -m ./llama.cpp/models/gpt4all/gpt4all-lora-converted.bin.orig -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

However it is writing nonsenses and does not let to interact with interactive mode. Maybe something is wrong.

Can anyone confirm if decapoda-research/llama-7b-hf's tokenizer.model is adequate to use in this case?

@ggerganov
Copy link
Member

After running convert-gpt4all-to-ggml.py and migrate-ggml-2023-03-30-pr613.py, main segfaults with a failed ggml assertion.
1GGML_ASSERT: H:\llama.cpp\ggml.c:3192: ((uintptr_t) (result->data))%GGML_MEM_ALIGN == 0

I commented out this line in ggml.c and recompiled to see what would happen, and it just worked. That was unexpected, but I won't complain.

This is strange. It's expected that it works after commenting this line since we don't really need the buffer to be aligned, but I wonder why it is not the case anymore. Seems to be related to the mmap change.

@d0rc
Copy link

d0rc commented Jun 2, 2023

It happened to me when trying to use --prompt-cache on a custom model

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants