Skip to content

Can't convert 4-bit safetensors file #1347

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
trollkotze opened this issue May 7, 2023 · 2 comments
Closed

Can't convert 4-bit safetensors file #1347

trollkotze opened this issue May 7, 2023 · 2 comments

Comments

@trollkotze
Copy link

Prerequisites

I downloaded the 4bit.safetensors and all the .json files and tokenizer model from this HuggingFace repo all into the same directory:
https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b/tree/main
Then first I tried to convert it with the convert.py script to ggml in q4_0 quantization and got this error message:

python3 convert.py /path/to/4bit.safetensors --outtype q4_0 --outfile /path/to/alpacadente.bin

Loading model file /path/to/4bit.safetensors
Loading vocab file /path/to/tokenizer.model
Traceback (most recent call last):
  File "convert.py", line 1165, in <module>
    main()
  File "convert.py", line 1157, in main
    model = convert_to_output_type(model, output_type)
  File "convert.py", line 1007, in convert_to_output_type
    return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
  File "convert.py", line 1007, in <dictcomp>
    return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
  File "convert.py", line 503, in astype
    self.validate_conversion_to(data_type)
  File "convert.py", line 514, in validate_conversion_to
    raise Exception(f"Can't turn an unquantized tensor into a quantized type ({data_type})")
Exception: Can't turn an unquantized tensor into a quantized type (QuantizedDataType(groupsize=32, have_addends=False, have_g_idx=False))

But this is a 4-bit-quantized safetensors file. So why does the script claim it's unquantized and refuse to convert it to quantized ggml?

Next I tried it without specifying output quantization and got a different error about a supposed vocab size mismatch:

python3 convert.py /path/to/4bit.safetensors --outfile /modelspace/alpacadente.bin
Loading model file /path/to/4bit.safetensors
Loading vocab file /path/to/tokenizer.model
Traceback (most recent call last):
  File "convert.py", line 1165, in <module>
    main()
  File "convert.py", line 1160, in main
    OutputFile.write_all(outfile, params, model, vocab)
  File "convert.py", line 958, in write_all
    check_vocab_size(params, vocab)
  File "convert.py", line 912, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has 32016, but /modelspace/alpastadente/tokenizer.model combined with /modelspace/alpastadente/added_tokens.json has 32005).

No idea how to deal with that. 11 out of 32016 tokens missing? I guess this one is less likely a problem with llama.cpp's script, but maybe it's rather a problem with the files in the HF repo, but might there be a way to tweak and fix something like this if it's just 11 tokens I have to put in somewhere?

Expected Behavior

Should be able to convert 4-bit safetensors to q4_0 ggml. And to q4_1, ... q4_3, q5_0, q5_1 as well would be cool.

Current Behavior

Throws those errors as described above.

Environment and Context

CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
RAM: 64 GB
OS: Linux Mint 20 or 21 or something, kernel 5.4.0-125-generic

$ python3 --version
Python 3.8.10
$ make --version
GNU Make 4.2.1
$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
@Green-Sky
Copy link
Collaborator

the gqtp 4bit quantization the 4bit.safetensor uses is only accidentally compatible, in some ways.
Please use the full pytorch model instead, it will result in better quality model files.

@trollkotze
Copy link
Author

@Green-Sky Okay. Thanks for the info. I guess I can close this issue then.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants