Skip to content

Server: use llama_chat_apply_template #5593

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 5 commits into from
Feb 20, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Feb 19, 2024

Closes #5575

This PR replaces the usage --chat-template introduced #5425 . This parameter now accepts a jinja template instead of type name.

If --chat-template is not specified, the default template (taken from model metadata) will be used instead.

This PR also fix the issue where llama_chat_apply_template does not read the metadata correctly.

CC @ggerganov and @cebtenzzre for review. Thank you!

@@ -2390,12 +2391,13 @@ static void server_params_parse(int argc, char **argv, server_params &sparams,
break;
}
std::string value(argv[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value seems unused now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's unused, I forgot to remove that. It's now removed

std::ostringstream output;
bool is_inside_turn = false;
// Check if the template supplied via "--chat-template" is supported or not. Returns true if it's valid
inline bool verify_custom_template(std::string tmpl) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline bool verify_custom_template(std::string tmpl) {
inline bool verify_custom_template(const std::string & tmpl) {

Comment on lines 186 to 192
for (size_t i = 0; i < messages.size(); ++i) {
auto &curr_msg = messages[i];
str[i] = json_value(curr_msg, "role", std::string(""));
str[i + 1] = json_value(curr_msg, "content", std::string(""));
alloc_size += str[i + 1].length();
chat[i].role = str[i].c_str();
chat[i].content = str[i + 1].c_str();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a bug here. Maybe change to str[2*i + 0] = ... and str[2*i + 1] = ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for notice that. That's why I noticed that the bot's response is quite weird when I test this PR yesterday.

Fixed on c53b34d

Looking at the debug log, I can confirm that the formatted chat is correct:

{"timestamp":1708423580,"level":"VERBOSE","function":"format_chat","line":208,"message":"formatted_chat","text":"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nhi, how are you<|im_end|>\n<|im_start|>assistant\n"}

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@ngxson ngxson marked this pull request as draft February 20, 2024 14:13
@ngxson
Copy link
Collaborator Author

ngxson commented Feb 20, 2024

I still spot a weird bug where the chat is formatted correctly, but then \u0000 is added when it's tokenized. I changed to draft and investigating that:

{"timestamp":1708438302,"level":"VERBOSE","function":"format_chat","line":208,"message":"formatted_chat","text":"[INST] <<SYS>>\nYou are a helpful assistant.\n<</SYS>>\n\nhi, how are you [/INST]"}
{"timestamp":1708438302,"level":"VERBOSE","function":"start_loop","line":293,"message":"have new task"}
{"timestamp":1708438302,"level":"VERBOSE","function":"start_loop","line":305,"message":"callback_new_task"}
slot 0 is processing [task id: 0]
{"timestamp":1708438302,"level":"VERBOSE","function":"start_loop","line":308,"message":"callback_all_task_finished"}
slot 0 : kv cache rm - [0, end)
{"timestamp":1708438302,"level":"VERBOSE","function":"update_slots","line":1685,"message":"prompt ingested","n_past":0,"cached":"","to_eval":"[INST] <<SYS>>\nYou are a helpful assistant.\n<</SYS>>\n\nhi, how are you [/INST]\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"}

Edit: found it! I forgot to buf.resize after receiving the result from llama_chat_apply_template

Works fine now (tested with both chatml + llama2 template)

@ngxson ngxson marked this pull request as ready for review February 20, 2024 14:21
@ngxson ngxson merged commit 9c405c9 into ggml-org:master Feb 20, 2024
@ibehnam
Copy link
Contributor

ibehnam commented Feb 20, 2024

@ngxson

Since this is a breaking change, it'd be good to update the server README to mention that the chat-template arg is now different. An example would be nice too.

Also, I found the following message vague. What are the "common" templates?

--chat-template JINJA_TEMPLATE
                            set custom jinja chat template (default: template taken from model's metadata)
                            Note: only commonly used templates are accepted, since we don't have jinja parser

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 21, 2024

@ibehnam Yeah I forgot about the doc. You're right, in fact, I was thinking about how to make it clear that which templates we support when showing this help, but the problem is that it's depends on llama_chat_apply_template. That function is the one that must be documented.

My idea is that we can add a section in server's doc that show how to use the --chat-template, then include a link to llama_chat_apply_template where user can see a list of supported templates.

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* server: use llama_chat_apply_template

* server: remove trailing space

* server: fix format_chat

* server: fix help message

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: fix formatted_chat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* server: use llama_chat_apply_template

* server: remove trailing space

* server: fix format_chat

* server: fix help message

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: fix formatted_chat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Server: use llama_chat_apply_template to format the chat
3 participants