Skip to content

Add llama_chat_apply_antiprompt #6378

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 0 commits into from
Closed

Conversation

danemadsen
Copy link
Contributor

This function is used to get the antiprompt for each respective template. I have added a test case into the test-chat-template test.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, but sorry because I'm not very convinced by this approach.

A better approach would be to modify llama_chat_apply_template to return the antiprompt along side with the formatted template.

I understand that this change is required to properly support different chat templates other than chatml in main.cpp. But even that, the current approach will definitely not work. For example, llama2 in fact does not requires antiprompt at all, since it uses EOS token to stop generation. Same goes for monarch and gemma.

Also, the term "antiprompt" also may not reflect what we need here, since "antiprompt" refers to the user prompt used in prompt-based template, which is widely used by alpaca (and very first version of chatgpt) before chat models become popular. To me, "antiprompt" is a hacky solution and should be forgotten. A better term maybe "stop_token" or "stop_sequence", which most modern models maps directly to single token (for example, <|im_end|> in chatml is one single token).

Overall, I think this subject deserves more researches to make it really clear before actually writing codes. I'll have a look next week when I have more time.


To clarify what I mean by saying that antiprompt is a hacky solution, let's have a look on an example for llama2 template:

<s>[INST]Hello[/INST]Hi, I'm assistant</s>

The generation stops at EOS token </s>. If we tell the model to stop at antiprompt </s><s>[INST] for example, then the problem is that we cannot make sure the model actually produce <s>[INST] after </s>. In short, in many case, the model will just produce irrelevant text after EOS and generation never stops.

Now you may ask, why not simply stop at EOS and add <s>[INST] to the next user prompt, then ends the user prompt with [/INST]. Now you may need to add 2 more functions instead of just one: one function to get the antiprompt <s>[INST] and one to get the prompt postfix [/INST]. And then worst: all of this complexity is already implemented inside llama_chat_apply_template, which introduce many duplication to the code.

For that reason, I don't think relying on antiprompt is a good option.

llama.cpp Outdated

std::string antiprompt;

if (tmpl_str == "chatml" || tmpl_str.find("<|im_start|>") != std::string::npos) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we need to introduce assign one enum for each template instead of having to duplicate all this code from llama_apply_chat_template

llama.h Outdated
@@ -783,6 +783,15 @@ extern "C" {
char * buf,
int32_t length);

/// Get antiprompts from either the provided template or the default template of the model
/// @param tmpl A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead.
/// @param buf A buffer to hold the output antiprompt. Since the length of all antiprompts is known it is assumed the alloc size is atleast 22.
Copy link
Collaborator

@ngxson ngxson Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot hard-code the length like this. The convention of all functions in llama.h is to always ask user the max length for output

llama.cpp Outdated
if (tmpl_str == "chatml" || tmpl_str.find("<|im_start|>") != std::string::npos) {
antiprompt = "<|im_start|>user";
} else if (tmpl_str == "llama2" || tmpl_str.find("[INST]") != std::string::npos) {
antiprompt = "[INST] user";
Copy link
Collaborator

@ngxson ngxson Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work with llama2, because there is no user in the antiprompt. Also, llama2 use EOS token to stop generation, so no antiprompt is needed.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants