Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0 #6132

Closed
maziyarpanahi opened this issue Mar 18, 2024 · 6 comments
Closed

Comments

@maziyarpanahi
Copy link

Hi,

I am trying to convert and quantized this model: https://huggingface.co/saltlux/luxia-21.4b-alignment-v1.0/

python llama.cpp/convert.py ~/.cache/huggingface/hub/models--saltlux--luxia-21.4b-alignment-v1.0/ --outtype f16 --outfile luxia-21.4b-alignment-v1.0.fp16.gguf

But I get this error when I use it for inference:

llama.cpp/main -m luxia-21.4b-alignment-v1.0.fp16.gguf -p "I need to create a presisted volume on Kubernetese and attach it to my application. Give me these two yaml files:" -n 400 -e


GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0llama.cpp/main -m quantized/saltlux/luxia-21.4b-alignment-v1.0/luxia-21.4b-alignment-v1.0.fp16.gguf -p "I need to create a presisted
volume on Kubernetese and attach it to my application. Give me these two yaml files:" -n 400 -e
Log start
main: build = 2442 (d84c4850)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1710758277
llama_model_loader: loaded meta data with 23 key-value pairs and 471 tensors from quantized/saltlux/luxia-21.4b-alignment-v1.0/luxia-21.4b-alignment-v1.0.fp16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 92544
llama_model_loader: - kv   3:                       llama.context_length u32              = 32768
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 6144
llama_model_loader: - kv   5:                          llama.block_count u32              = 52
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 16384
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 48
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 1
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,92544]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,92544]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,92544]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - type  f32:  105 tensors
llama_model_loader: - type  f16:  366 tensors
GGML_ASSERT: llama.cpp:3817: unicode_cpts_from_utf8(word).size() > 0
Aborted (core dumped)

I've never seen this error and I cannot find anything remotely similar to this issue. What could cause this issue?

@akumaburn
Copy link

Still an issue.

@github-actions github-actions bot removed the stale label Apr 24, 2024
@Speedway1
Copy link

To confirm that we're also seeing the exact same issue.

@Speedway1
Copy link

Two git issues were raised, with the problem in the code identified, but automatically closed due to inactivity:
#5112
#4360

Looks like the bug is the handling of token 354 (\u0000).

@bartowski1182
Copy link
Contributor

seeing this with https://huggingface.co/fblgit/UNA-ThePitbull-21.4-v1 which has the same \u0000 token

wonder if the code needs a specific catch for it

@github-actions github-actions bot added the stale label Jun 26, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@silverjam
Copy link

I added some naive handling of the \u0000 token (to basically ignore it) but this wasn't sufficient, so obviously something more comprehensive is needed.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

5 participants