You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I downloaded the 4bit.safetensors and all the .json files and tokenizer model from this HuggingFace repo all into the same directory: https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b/tree/main
Then first I tried to convert it with the convert.py script to ggml in q4_0 quantization and got this error message:
python3 convert.py /path/to/4bit.safetensors --outtype q4_0 --outfile /path/to/alpacadente.bin
Loading model file /path/to/4bit.safetensors
Loading vocab file /path/to/tokenizer.model
Traceback (most recent call last):
File "convert.py", line 1165, in <module>
main()
File "convert.py", line 1157, in main
model = convert_to_output_type(model, output_type)
File "convert.py", line 1007, in convert_to_output_type
return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
File "convert.py", line 1007, in <dictcomp>
return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
File "convert.py", line 503, in astype
self.validate_conversion_to(data_type)
File "convert.py", line 514, in validate_conversion_to
raise Exception(f"Can't turn an unquantized tensor into a quantized type ({data_type})")
Exception: Can't turn an unquantized tensor into a quantized type (QuantizedDataType(groupsize=32, have_addends=False, have_g_idx=False))
But this is a 4-bit-quantized safetensors file. So why does the script claim it's unquantized and refuse to convert it to quantized ggml?
Next I tried it without specifying output quantization and got a different error about a supposed vocab size mismatch:
python3 convert.py /path/to/4bit.safetensors --outfile /modelspace/alpacadente.bin
Loading model file /path/to/4bit.safetensors
Loading vocab file /path/to/tokenizer.model
Traceback (most recent call last):
File "convert.py", line 1165, in <module>
main()
File "convert.py", line 1160, in main
OutputFile.write_all(outfile, params, model, vocab)
File "convert.py", line 958, in write_all
check_vocab_size(params, vocab)
File "convert.py", line 912, in check_vocab_size
raise Exception(msg)
Exception: Vocab size mismatch (model has 32016, but /modelspace/alpastadente/tokenizer.model combined with /modelspace/alpastadente/added_tokens.json has 32005).
No idea how to deal with that. 11 out of 32016 tokens missing? I guess this one is less likely a problem with llama.cpp's script, but maybe it's rather a problem with the files in the HF repo, but might there be a way to tweak and fix something like this if it's just 11 tokens I have to put in somewhere?
Expected Behavior
Should be able to convert 4-bit safetensors to q4_0 ggml. And to q4_1, ... q4_3, q5_0, q5_1 as well would be cool.
Current Behavior
Throws those errors as described above.
Environment and Context
CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
RAM: 64 GB
OS: Linux Mint 20 or 21 or something, kernel 5.4.0-125-generic
$ python3 --version
Python 3.8.10
$ make --version
GNU Make 4.2.1
$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
The text was updated successfully, but these errors were encountered:
the gqtp 4bit quantization the 4bit.safetensor uses is only accidentally compatible, in some ways.
Please use the full pytorch model instead, it will result in better quality model files.
Prerequisites
I downloaded the 4bit.safetensors and all the .json files and tokenizer model from this HuggingFace repo all into the same directory:
https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b/tree/main
Then first I tried to convert it with the
convert.py
script to ggml in q4_0 quantization and got this error message:But this is a 4-bit-quantized safetensors file. So why does the script claim it's unquantized and refuse to convert it to quantized ggml?
Next I tried it without specifying output quantization and got a different error about a supposed vocab size mismatch:
No idea how to deal with that. 11 out of 32016 tokens missing? I guess this one is less likely a problem with llama.cpp's script, but maybe it's rather a problem with the files in the HF repo, but might there be a way to tweak and fix something like this if it's just 11 tokens I have to put in somewhere?
Expected Behavior
Should be able to convert 4-bit safetensors to q4_0 ggml. And to q4_1, ... q4_3, q5_0, q5_1 as well would be cool.
Current Behavior
Throws those errors as described above.
Environment and Context
CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
RAM: 64 GB
OS: Linux Mint 20 or 21 or something, kernel 5.4.0-125-generic
The text was updated successfully, but these errors were encountered: