-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Token to piece (#1) #4106
base: master
Are you sure you want to change the base?
Token to piece (#1) #4106
Conversation
* Update common.h * Update common.cpp * Update llama.h * Update llama.cpp * Update llama.h * Update common.cpp * Update llama.cpp
@@ -550,7 +550,8 @@ extern "C" { | |||
const struct llama_model * model, | |||
llama_token token, | |||
char * buf, | |||
int length); | |||
int length, | |||
bool print_all_types); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed through the patch. From the logic standpoint, it doesn't look like your change will break anything. However, this is a breaking API change I think since you can't set a default there.
I don't know whether or not that's a serious problem, but if you wanted to avoid it, you could add a new API function instead and then have the original llama_token_to_piece
call that with your new flag set to false. That way you could opt into the new behavior by using the new function.
The other thing I noticed is the lack of a space after the comma in code like result.size(),special)
doesn't conform to the existing style. Also print_all_types=false)
should probably have a space around the equals sign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right on the spaces, I manually typed the changes on github from what I did in my local IDE. Didn't think about that.
I considered creating just a _special version of the function, though all the other API functions use the "bool special" or a similar boolean when it comes to switching special token behavior. So it also would be a bit codestyle breaking.
But I can change it that way ofc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really qualified to say what's worth breaking the API, so I was just suggesting a way to avoid that if you wanted to.
I'm not sure what to make out of the Swift check that failed ? Regarding API: it might even be useful to consider changing the bool flag to a OR flag that can combine multiple enums of llama_token_types I believe there is no way around it so user defined tokens can be printed |
If you're going that far, you might as well switch to using a struct to pass in settings. Then you can support multiple parameters, and even ones that aren't boolean. It's also more future proof. |
This adds the option to un-tokenize special tokens, something I recently needed and I found no way to get it from the exposed functions.
Any special tokens like EOS ("</s" or "<|end-of-text|>" would return empty instead.
This adds a boolean that defaults to false, when set to true the special tokens are treated as normal.
Example:
Please have a second closer look, it appears to be fine on my end but it's a central API function.