Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Changing default repeat_last_n value to current context size? #787

Closed
rmn20 opened this issue Apr 5, 2023 · 3 comments
Closed

Changing default repeat_last_n value to current context size? #787

rmn20 opened this issue Apr 5, 2023 · 3 comments
Labels
enhancement New feature or request generation quality Quality of model output stale

Comments

@rmn20
Copy link

rmn20 commented Apr 5, 2023

I noticed that llama 7b almost always gets stuck in a loop after a certain amount of time. This problem has reoccurred to me throughout the all time I have been trying to use llama.cpp (since March 15). I have also tried different models such as alpaca and gpt4all unfiltered, but the problem remains still. It also becomes obvious when you try to generate a dialog following some kind of plot (I use --keep to keep the plot summary in context). All the times I've tried to generate something infinite, it just loops at some point, even in interactive mode.

I also noticed, that setting repeat_last_n to current context size helps to eliminate this issue. (I use ctx_size 2048 for the most time)

Maybe after some testing, default repeat_last_n value could be changed to currently set context size, so newbies could bypass this issue?

@gjmulder gjmulder added generation quality Quality of model output enhancement New feature or request labels Apr 6, 2023
@dogjamboree
Copy link

If I understand correctly, repeat_last_n works by avoiding generating any of the tokens (words) in the last n, so if you set it as high as something like 2048, the text will be coherent at first but rapidly devolve into flowery and the nonsensical speech as the models looks for new tokens that haven't been used yet.

This is how it works in theory, and I've seen the same experience in practice as well. It certainly makes for an 'interesting' dialog / chat / whatever when you set it so high but good luck making any sense of the answers if they're more than a couple of paragraphs.

@Piezoid
Copy link
Contributor

Piezoid commented Apr 10, 2023

If I understand correctly, repeat_last_n works by avoiding generating any of the tokens (words) in the last n, so if you set it as high as something like 2048, the text will be coherent at first but rapidly devolve into flowery and the nonsensical speech as the models looks for new tokens that haven't been used yet.

This is the issue I wanted to address with #331.

@rmn20 Can you try this branch? On this branch, setting --repeat_half_life 32 will detect repeats over the whole context, but recent and long repeats are penalized more strongly that old and shorter ones.

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request generation quality Quality of model output stale
Projects
None yet
Development

No branches or pull requests

4 participants