Add a param to force the [end of text] to show, even in interactive mode #967

jeffersoncgo · 2023-04-14T14:00:00Z

Is possible to add a param to allow force show the [end of text] token?

like this(i think, don't understand C/C++)

if (!embd.empty() && embd.back() == llama_token_eos()) {
    if (params.forceendtoken || !params.instruct) {
        fprintf(stderr, " [end of text]\n");
    }
    if (params.instruct) {
        is_interacting = true;
    } else {
       break;
    }
}

jeffersoncgo · 2023-04-14T20:18:50Z

Update, looking at the code, i tryed to compile it here, but is really slow, compared to the released.
Lines i sujest(beg) to change(or make better, i didn't take performance into consideration)

main.cpp

if (!embd.empty() && embd.back() == llama_token_eos()) {
            if (instruct_mode) {
                is_interacting = true;
            } else {
                fprintf(stderr, " [end of text]\n");
                break;
            }
        }

to

if (!embd.empty() && embd.back() == llama_token_eos()) {
            if (params.forceendtoken || !params.instruct) {
                fprintf(stderr, " [end of text]\n");
            }
            if (params.instruct) {
                is_interacting = true;
            }
            else {
                break;
            }
        }

and

commom.cpp

} else {
             fprintf(stderr, "error: unknown argument: %s\n", arg.c_str());
            gpt_print_usage(argv[0], default_params);
            exit(1);
}

to

} else if (arg == "--forceendtoken") {
            params.forceendtoken = true;
} else {
            fprintf(stderr, "error: unknown argument: %s\n", arg.c_str());
            gpt_print_usage(argv[0], default_params);
            exit(1);
}

commom.h - on "struct gpt_params"

bool multiline_mode    = true; // enables multi-line mode, to send input press CTRL+D on Linux/Max, Ctrl+Z then Return on Windows

bellow it add

bool forceendtoken     = true; // Force show the "[end of text]" token after the generation

I did it and "worked" but the generation became really slow.

If possible, please, add these

multi modal params fix: add logits = True -> to make llava work

github-actions · 2024-04-09T01:10:17Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

README.md multimodal params fix (ggml-org#967)

6bbeea0

multi modal params fix: add logits = True -> to make llava work

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a param to force the [end of text] to show, even in interactive mode #967

Add a param to force the [end of text] to show, even in interactive mode #967

jeffersoncgo commented Apr 14, 2023 •

edited

Loading

jeffersoncgo commented Apr 14, 2023 •

edited

Loading

github-actions bot commented Apr 9, 2024

Add a param to force the [end of text] to show, even in interactive mode #967

Add a param to force the [end of text] to show, even in interactive mode #967

Comments

jeffersoncgo commented Apr 14, 2023 • edited Loading

jeffersoncgo commented Apr 14, 2023 • edited Loading

github-actions bot commented Apr 9, 2024

jeffersoncgo commented Apr 14, 2023 •

edited

Loading

jeffersoncgo commented Apr 14, 2023 •

edited

Loading