Changing default repeat_last_n value to current context size? #787

rmn20 · 2023-04-05T18:18:14Z

I noticed that llama 7b almost always gets stuck in a loop after a certain amount of time. This problem has reoccurred to me throughout the all time I have been trying to use llama.cpp (since March 15). I have also tried different models such as alpaca and gpt4all unfiltered, but the problem remains still. It also becomes obvious when you try to generate a dialog following some kind of plot (I use --keep to keep the plot summary in context). All the times I've tried to generate something infinite, it just loops at some point, even in interactive mode.

I also noticed, that setting repeat_last_n to current context size helps to eliminate this issue. (I use ctx_size 2048 for the most time)

Maybe after some testing, default repeat_last_n value could be changed to currently set context size, so newbies could bypass this issue?

dogjamboree · 2023-04-10T00:41:13Z

If I understand correctly, repeat_last_n works by avoiding generating any of the tokens (words) in the last n, so if you set it as high as something like 2048, the text will be coherent at first but rapidly devolve into flowery and the nonsensical speech as the models looks for new tokens that haven't been used yet.

This is how it works in theory, and I've seen the same experience in practice as well. It certainly makes for an 'interesting' dialog / chat / whatever when you set it so high but good luck making any sense of the answers if they're more than a couple of paragraphs.

Piezoid · 2023-04-10T17:26:44Z

If I understand correctly, repeat_last_n works by avoiding generating any of the tokens (words) in the last n, so if you set it as high as something like 2048, the text will be coherent at first but rapidly devolve into flowery and the nonsensical speech as the models looks for new tokens that haven't been used yet.

This is the issue I wanted to address with #331.

@rmn20 Can you try this branch? On this branch, setting --repeat_half_life 32 will detect repeats over the whole context, but recent and long repeats are penalized more strongly that old and shorter ones.

github-actions · 2024-04-11T01:07:02Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

gjmulder added generation quality Quality of model output enhancement New feature or request labels Apr 6, 2023

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 11, 2024

Tuxaios mentioned this issue Sep 19, 2024

I kindly request the addition of a repeat_half_life parameter open-webui/open-webui#5512

Closed

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing default repeat_last_n value to current context size? #787

Changing default repeat_last_n value to current context size? #787

rmn20 commented Apr 5, 2023

dogjamboree commented Apr 10, 2023

Piezoid commented Apr 10, 2023

github-actions bot commented Apr 11, 2024

Changing default repeat_last_n value to current context size? #787

Changing default repeat_last_n value to current context size? #787

Comments

rmn20 commented Apr 5, 2023

dogjamboree commented Apr 10, 2023

Piezoid commented Apr 10, 2023

github-actions bot commented Apr 11, 2024