Skip to content

Tokenization Example #1193

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
rozek opened this issue Apr 26, 2023 · 2 comments
Closed

Tokenization Example #1193

rozek opened this issue Apr 26, 2023 · 2 comments
Labels

Comments

@rozek
Copy link

rozek commented Apr 26, 2023

First of all: thank you very much for the continuing work on llama.cpp - I'm using it every day with various models.

For proper context management, however, I often need to know how many tokens prompts and responses contain. There is an "embedding" example, but none for "tokenization".

This is why I made my own (see my own fork of llama.cpp)

It seems to work, but since I am no C++ programmer and, in addition, not a real AI expert, I hesitate to create a pull request.

Perhaps, somebody else may have a look at it or create a better example for the public...

Thanks for all your effort!

@SlyEcho
Copy link
Collaborator

SlyEcho commented Apr 27, 2023

I think test-tokenizer-0.cpp is a good example of a minimal tokenizer.

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

github-actions bot commented Apr 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants