Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Remove chunk_word_count and chunk approximations #429

Open
bbrowning opened this issue Dec 5, 2024 · 0 comments
Open

Remove chunk_word_count and chunk approximations #429

bbrowning opened this issue Dec 5, 2024 · 0 comments

Comments

@bbrowning
Copy link
Contributor

Now that we have the teacher models' Tokenizer, we can stop approximating chunk counts using chunk_word_count, _num_tokens_from_words, _num_chars_from_tokens, etc. We can always refer to chunk sizes in Tokens instead of ever needing to convert to and from "words".

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant