-
Notifications
You must be signed in to change notification settings - Fork 799
Issues: huggingface/tokenizers
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Allow users to select/write encoding strategies
Feature Request
#1655
opened Oct 16, 2024 by
pietrolesci
Inconsistent behaviour of Something isn't working
PreTrainedTokenizerFast
s on diacritics marked texts
bug
#1663
opened Oct 11, 2024 by
sven-nm
2 of 4 tasks
Disable pretty-print when saving tokenizer.json files
Feature Request
#1656
opened Oct 7, 2024 by
xenova
How to build a custom tokenizer on top of a exsiting Llama 3.2 tokenizer?
training
#1644
opened Oct 5, 2024 by
yakhyo
NormalizedString.clear() broken?
bug
Something isn't working
#1636
opened Sep 25, 2024 by
lkurlandski
Adding many AddedTokens makes loading a tokenizer extremely slow.
#1635
opened Sep 25, 2024 by
stephantul
Rust: How to handle models with
precompiled_charsmap = null
Feature Request
#1627
opened Sep 4, 2024 by
kallebysantos
Special token gets tokenized while training tokenizer from scratch
#1624
opened Sep 2, 2024 by
LalchandPandia
ModuleNotFoundError: No module named 'tokenizers.tokenizers'
#1619
opened Aug 25, 2024 by
jpferraro1
Space after unnormalized token is added when
use_fast=True
for Llama tokenizer
#1613
opened Aug 14, 2024 by
Butanium
Support for Golang now or support a cli for other languages?
#1601
opened Aug 7, 2024 by
xuxiaoxia96
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.