Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add support for canonical encoding #15

Open
Saibo-creator opened this issue Feb 29, 2024 · 0 comments
Open

Add support for canonical encoding #15

Saibo-creator opened this issue Feb 29, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@Saibo-creator
Copy link
Collaborator

Currently we allow all encodings that satiisfy the constraints, but actually the tokenizer has a canonical encoding, which i generally the shortest and with priority.
For example,
"aaaa" can be tokenized as ["aa", "aa"] or ["aaa", "a"] or ["a", "aaa"] or more.
Currently our constraints allow all of them but actually only one is canonical.
One flag can be added to enfore strict mode

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant