-
Notifications
You must be signed in to change notification settings - Fork 11.4k
chat-persistent.sh not rotating cache files correctly #1670
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Nice find! Quick fix might be to be more conservative in |
Do you think it's an issue with the bash script, or a recent change to main.cpp? As mentioned above, this was working great on an older commit. |
I don't see how that patch could be related to the problem you describe. It's not actually changing the context size, it's just forcing some extra calculations to ensure accuracy. You could try reversing the patch to see if it fixes things for you. If you aren't compiling it yourself you could try official releases. I'd start with this one and jump forward a bunch at a time until you find one that doesn't work and then work backwards. If you can find the exact point at which it stops working for you, then that will make it easier to figure out what's going wrong. |
Yeah, it came down to how the evaluated tokens were appended to the existing session. I've got a fix in #1678. |
* Fix prompt cache saving and chat-persistent rollover (fixes #1670) * clang-tidy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Works perfectly! Thanks @ejones |
On Mac M2, I am running
./examples/chat-persistent.sh
on the latest release: https://github.com/ggerganov/llama.cpp/releases/tag/master-ffb06a3 at the time of writing.The cache files do not rotate as expected when the context size of 2048 is reached. Instead, the process exits and this error log appears:
llama_load_session_file : token count in session file exceeded capacity! 2089 > 2048
.On this older commit https://github.com/ggerganov/llama.cpp/releases/tag/master-66874d4 the script works as expected.
Looking at the diff, it seems like the issue might have been introduced in this commit, but I'm not 100% sure: 2483676
cc @ejones @DannyDaemonic
The text was updated successfully, but these errors were encountered: