[FR] Full llama.cpp integration local / remote #44

fszontagh · 2025-03-20T17:03:19Z

Individual TAB on the GUI to implement interactions with language models

add chat box with multiple session handling
implement model management

iwr-redmond · 2025-03-20T17:05:57Z

Nomic's GPT4All desktop application is written in C++ with a QT frontend. It's also MIT-licensed, which means that anything useful for this FR can be easily adopted.

fszontagh · 2025-03-29T20:13:20Z

iwr-redmond · 2025-03-29T20:37:40Z

Record time! Make sure to provide a facility for setting up chat templates and system prompts.

GPT4all recently migrated to minja from their own simplified template format, which I reckon was easier to understand.

fszontagh · 2025-03-29T21:30:26Z

There is template handling in llama.cpp:

sd.cpp.gui.wx/llama/src/ApplicationLogic.cpp

Line 320 in 5954925

const char* tmpl = llama_model_chat_template(model, /* name */ nullptr);

sd.cpp.gui.wx/llama/src/ApplicationLogic.cpp

Line 336 in 5954925

    
           prev_len = llama_chat_apply_template(tmpl, messages.data(), messages.size(), false, nullptr, 0);

This currently can load the template from the model only. I need further investigation.

I need to do a lot of fine tuning in it.

iwr-redmond · 2025-03-29T21:50:32Z

IIRC older GGUF models don't have built in templates. You can confirm this by loading the same file in the current GPT4all release.

EDIT: Compared the default prompt template for Zephyr-7B in GPT4all, with the late 2023 GGUF showing no template and the late-2024 GGUF showing a built-in Jinja2 template.

fszontagh · 2025-03-30T14:57:56Z

FMI:

fszontagh · 2025-04-01T20:54:14Z

A small reminder

Steps for starting a chat session

User selects a model from the list, GUI sends a command to the llama's extprocess (if it is ready and available) to load the selected model
when model is loaded into RAM / VRAM, then llama's extprocess reads the meta data from the model file, filling up some settings (template, max. context size etc.. if they are exists in the model)
when user sends the prompt, then the llama's extprocess loads the context using the editable settings from the UI (batch size, context size, number of threads)
the prompt template is only used when prompt is sent to the process. Other settings whitch are related to the context or the model, they can not be modified in an already started chat session. (it can be with reloading the context or the model to apply the new cfg, but not implemented yet)

FMI:

prompt template can be changed after when the model sent response. The "history" is always reformatted by the template
number of threads is comming from the settings which is already used at stable diffusion

TODO:

add more fine tuning settings to the GUI (sampler settings):
- temp
- min p
- top k
- dist
implement kv cache to store already used tokents (save / restore the chat history)
use one webview per chat session

fszontagh · 2025-04-01T21:21:54Z

Here is an all-in-one template (llama-3.2-3B-Instruct)

{{- bos_token }}
{%- if custom_tools is defined %}
    {%- set tools = custom_tools %}
{%- endif %}
{%- if not tools_in_user_message is defined %}
    {%- set tools_in_user_message = true %}
{%- endif %}
{%- if not date_string is defined %}
    {%- if strftime_now is defined %}
        {%- set date_string = strftime_now("%d %b %Y") %}
    {%- else %}
        {%- set date_string = "26 Jul 2024" %}
    {%- endif %}
{%- endif %}
{%- if not tools is defined %}
    {%- set tools = none %}
{%- endif %}

{#- This block extracts the system message, so we can slot it into the right place. #}
{%- if messages[0]['role'] == 'system' %}
    {%- set system_message = messages[0]['content']|trim %}
    {%- set messages = messages[1:] %}
{%- else %}
    {%- set system_message = "" %}
{%- endif %}

{#- System message #}
{{- "<|start_header_id|>system<|end_header_id|>\n\n" }}
{%- if tools is not none %}
    {{- "Environment: ipython\n" }}
{%- endif %}
{{- "Cutting Knowledge Date: December 2023\n" }}
{{- "Today Date: " + date_string + "\n\n" }}
{%- if tools is not none and not tools_in_user_message %}
    {{- "You have access to the following functions. To call a function, please respond with JSON for a function call." }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\n\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\n\n" }}
    {%- endfor %}
{%- endif %}
{{- system_message }}
{{- "<|eot_id|>" }}

{#- Custom tools are passed in a user message with some extra guidance #}
{%- if tools_in_user_message and not tools is none %}
    {#- Extract the first user message so we can plug it in here #}
    {%- if messages | length != 0 %}
        {%- set first_user_message = messages[0]['content']|trim %}
        {%- set messages = messages[1:] %}
    {%- else %}
        {{- raise_exception("Cannot put tools in the first user message when there's no first user message!") }}
{%- endif %}
    {{- '<|start_header_id|>user<|end_header_id|>\n\n' -}}
    {{- "Given the following functions, please respond with a JSON for a function call " }}
    {{- "with its proper arguments that best answers the given prompt.\n\n" }}
    {{- 'Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}.' }}
    {{- "Do not use variables.\n\n" }}
    {%- for t in tools %}
        {{- t | tojson(indent=4) }}
        {{- "\n\n" }}
    {%- endfor %}
    {{- first_user_message + "<|eot_id|>"}}
{%- endif %}

{%- for message in messages %}
    {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
        {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
    {%- elif 'tool_calls' in message %}
        {%- if not message.tool_calls|length == 1 %}
            {{- raise_exception("This model only supports single tool-calls at once!") }}
        {%- endif %}
        {%- set tool_call = message.tool_calls[0].function %}
        {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' -}}
        {{- '{"name": "' + tool_call.name + '", ' }}
        {{- '"parameters": ' }}
        {{- tool_call.arguments | tojson }}
        {{- "}" }}
        {{- "<|eot_id|>" }}
    {%- elif message.role == "tool" or message.role == "ipython" %}
        {{- "<|start_header_id|>ipython<|end_header_id|>\n\n" }}
        {%- if message.content is mapping or message.content is iterable %}
            {{- message.content | tojson }}
        {%- else %}
            {{- message.content }}
        {%- endif %}
        {{- "<|eot_id|>" }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
    {{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{%- endif %}

iwr-redmond · 2025-04-02T23:11:09Z

You may wish to consider allowing the kv_cache_type to be set. At Q8_0, this can save a lot of VRAM without noticeably reducing quality.

fszontagh added enhancement New feature or request good first issue Good for newcomers labels Mar 20, 2025

fszontagh self-assigned this Mar 20, 2025

fszontagh added this to Stable Diffusion GUI Mar 20, 2025

fszontagh moved this to Planning in Stable Diffusion GUI Mar 20, 2025

fszontagh moved this from Planning to In progress in Stable Diffusion GUI Mar 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Full llama.cpp integration local / remote #44

[FR] Full llama.cpp integration local / remote #44

fszontagh commented Mar 20, 2025

iwr-redmond commented Mar 20, 2025

fszontagh commented Mar 29, 2025

iwr-redmond commented Mar 29, 2025 •

edited

Loading

fszontagh commented Mar 29, 2025

iwr-redmond commented Mar 29, 2025 •

edited

Loading

fszontagh commented Mar 30, 2025 •

edited

Loading

fszontagh commented Apr 1, 2025 •

edited

Loading

fszontagh commented Apr 1, 2025

iwr-redmond commented Apr 2, 2025

[FR] Full llama.cpp integration local / remote #44

[FR] Full llama.cpp integration local / remote #44

Comments

fszontagh commented Mar 20, 2025

iwr-redmond commented Mar 20, 2025

fszontagh commented Mar 29, 2025

iwr-redmond commented Mar 29, 2025 • edited Loading

fszontagh commented Mar 29, 2025

iwr-redmond commented Mar 29, 2025 • edited Loading

fszontagh commented Mar 30, 2025 • edited Loading

fszontagh commented Apr 1, 2025 • edited Loading

fszontagh commented Apr 1, 2025

iwr-redmond commented Apr 2, 2025

iwr-redmond commented Mar 29, 2025 •

edited

Loading

iwr-redmond commented Mar 29, 2025 •

edited

Loading

fszontagh commented Mar 30, 2025 •

edited

Loading

fszontagh commented Apr 1, 2025 •

edited

Loading