Skip to content

chat-persistent.sh not rotating cache files correctly #1670

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
spencekim opened this issue Jun 1, 2023 · 5 comments · Fixed by #1678
Closed

chat-persistent.sh not rotating cache files correctly #1670

spencekim opened this issue Jun 1, 2023 · 5 comments · Fixed by #1678

Comments

@spencekim
Copy link

On Mac M2, I am running ./examples/chat-persistent.sh on the latest release: https://github.com/ggerganov/llama.cpp/releases/tag/master-ffb06a3 at the time of writing.

The cache files do not rotate as expected when the context size of 2048 is reached. Instead, the process exits and this error log appears: llama_load_session_file : token count in session file exceeded capacity! 2089 > 2048.

On this older commit https://github.com/ggerganov/llama.cpp/releases/tag/master-66874d4 the script works as expected.

Looking at the diff, it seems like the issue might have been introduced in this commit, but I'm not 100% sure: 2483676

cc @ejones @DannyDaemonic

@ejones
Copy link
Collaborator

ejones commented Jun 2, 2023

Nice find! Quick fix might be to be more conservative in chat-persistent.sh about how much it's willing to predict. It's all kind of a hack right now anyway as it's scraping the stderr logs to track token usage. Deeper questions would be, how is the session file allowed to overrun the context size.

@spencekim
Copy link
Author

Do you think it's an issue with the bash script, or a recent change to main.cpp? As mentioned above, this was working great on an older commit.

@DannyDaemonic
Copy link
Contributor

I don't see how that patch could be related to the problem you describe. It's not actually changing the context size, it's just forcing some extra calculations to ensure accuracy.

You could try reversing the patch to see if it fixes things for you. If you aren't compiling it yourself you could try official releases. I'd start with this one and jump forward a bunch at a time until you find one that doesn't work and then work backwards. If you can find the exact point at which it stops working for you, then that will make it easier to figure out what's going wrong.

@ejones
Copy link
Collaborator

ejones commented Jun 3, 2023

Yeah, it came down to how the evaluated tokens were appended to the existing session. I've got a fix in #1678.

ejones added a commit that referenced this issue Jun 3, 2023
* Fix prompt cache saving and chat-persistent rollover (fixes #1670)

* clang-tidy

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@spencekim
Copy link
Author

Works perfectly! Thanks @ejones

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants