Add llama_chat_apply_antiprompt #6378

danemadsen · 2024-03-29T01:47:57Z

This function is used to get the antiprompt for each respective template. I have added a test case into the test-chat-template test.

ngxson

Thanks for the contribution, but sorry because I'm not very convinced by this approach.

A better approach would be to modify llama_chat_apply_template to return the antiprompt along side with the formatted template.

I understand that this change is required to properly support different chat templates other than chatml in main.cpp. But even that, the current approach will definitely not work. For example, llama2 in fact does not requires antiprompt at all, since it uses EOS token to stop generation. Same goes for monarch and gemma.

Also, the term "antiprompt" also may not reflect what we need here, since "antiprompt" refers to the user prompt used in prompt-based template, which is widely used by alpaca (and very first version of chatgpt) before chat models become popular. To me, "antiprompt" is a hacky solution and should be forgotten. A better term maybe "stop_token" or "stop_sequence", which most modern models maps directly to single token (for example, <|im_end|> in chatml is one single token).

Overall, I think this subject deserves more researches to make it really clear before actually writing codes. I'll have a look next week when I have more time.

To clarify what I mean by saying that antiprompt is a hacky solution, let's have a look on an example for llama2 template:

<s>[INST]Hello[/INST]Hi, I'm assistant</s>

The generation stops at EOS token </s>. If we tell the model to stop at antiprompt </s><s>[INST] for example, then the problem is that we cannot make sure the model actually produce <s>[INST] after </s>. In short, in many case, the model will just produce irrelevant text after EOS and generation never stops.

Now you may ask, why not simply stop at EOS and add <s>[INST] to the next user prompt, then ends the user prompt with [/INST]. Now you may need to add 2 more functions instead of just one: one function to get the antiprompt <s>[INST] and one to get the prompt postfix [/INST]. And then worst: all of this complexity is already implemented inside llama_chat_apply_template, which introduce many duplication to the code.

For that reason, I don't think relying on antiprompt is a good option.

ngxson · 2024-03-29T21:37:14Z

llama.cpp

+
+    std::string antiprompt;
+
+    if (tmpl_str == "chatml" || tmpl_str.find("<|im_start|>") != std::string::npos) {


Probably we need to introduce assign one enum for each template instead of having to duplicate all this code from llama_apply_chat_template

ngxson · 2024-03-29T21:39:24Z

llama.h

@@ -783,6 +783,15 @@ extern "C" {
                                  char * buf,
                               int32_t   length);

+    /// Get antiprompts from either the provided template or the default template of the model
+    /// @param tmpl A Jinja template to use for this chat. If this is nullptr, the model’s default chat template will be used instead.
+    /// @param buf A buffer to hold the output antiprompt. Since the length of all antiprompts is known it is assumed the alloc size is atleast 22.


We cannot hard-code the length like this. The convention of all functions in llama.h is to always ask user the max length for output

ngxson · 2024-03-29T21:40:00Z

llama.cpp

+    if (tmpl_str == "chatml" || tmpl_str.find("<|im_start|>") != std::string::npos) {
+        antiprompt = "<|im_start|>user";
+    } else if (tmpl_str == "llama2" || tmpl_str.find("[INST]") != std::string::npos) {
+        antiprompt = "[INST] user";


This won't work with llama2, because there is no user in the antiprompt. Also, llama2 use EOS token to stop generation, so no antiprompt is needed.

ngxson reviewed Mar 29, 2024

View reviewed changes

ngxson mentioned this pull request Mar 29, 2024

Implement (properly) different chat templates in main.cpp #6391

Closed

danemadsen closed this Apr 1, 2024

danemadsen force-pushed the master branch from a23118d to c50a82c Compare April 1, 2024 10:22

EZForever mentioned this pull request Apr 12, 2024

server: Use llama_chat_apply_template on /completion endpoint #6624

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add llama_chat_apply_antiprompt #6378

Add llama_chat_apply_antiprompt #6378

danemadsen commented Mar 29, 2024

ngxson left a comment •

edited

Loading

ngxson Mar 29, 2024

ngxson Mar 29, 2024 •

edited

Loading

ngxson Mar 29, 2024 •

edited

Loading


		std::string antiprompt;

		if (tmpl_str == "chatml" \|\| tmpl_str.find("<\|im_start\|>") != std::string::npos) {

Add llama_chat_apply_antiprompt #6378

Add llama_chat_apply_antiprompt #6378

Conversation

danemadsen commented Mar 29, 2024

ngxson left a comment • edited Loading

Choose a reason for hiding this comment

ngxson Mar 29, 2024

Choose a reason for hiding this comment

ngxson Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

ngxson Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

ngxson left a comment •

edited

Loading

ngxson Mar 29, 2024 •

edited

Loading

ngxson Mar 29, 2024 •

edited

Loading