-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Unusable and does not match with token output from GPT-3 #9
Comments
It seems to work fine
Hmm maybe somone else fixed that but it seems to work fine in the latest version see the added test |
That's because this encoder is actually for the older models. Doesn't match up with gpt-3.5-turbo or gpt-4. |
@niieani wdyt about this one? https://github.com/dqbd/tiktoken |
hmm i cant reproduce build looks promising though notable pros no ts :) |
@seyfer Tiktoken JS looks good too. My gpt-tokenizer has a few extra features though that might be useful to you (like checking whether a given text is within the token limit or not). |
When a character like “ is used it will give back a faulty output as shown below.
encode('“wrote jack a letter”');
[null,222,250, 42910,14509,257,3850,null,222,251]
Whereas on openai it will give the output as:
[447, 250, 42910, 14509, 257, 3850, 447, 251]
This can be triggered by other characters like █ and many more.
The text was updated successfully, but these errors were encountered: