llama : refactor get / set state + remove redundant kv cache API #1143

ggerganov · 2023-04-23T15:57:53Z

Normalize the code style
Move the definitions at the correct place in llama.cpp
Retire llama_get_kv_cache(), llama_get_kv_cache_size() and llama_set_kv_cache()

Not sure how to test this - maybe we need to add an example, or extend main with store/load state functionality

xaedes

The change looks good to me, I love the readability improvements. The save_load script from #730 (comment) works as well.

I have converted this to an example with cmake already, will pull request it.

ejones · 2023-04-25T16:35:46Z

If it's helpful, I put up a take on save/load state in main in #1169 (mostly due to my impatience with 65B on the chat-13B prompt, hah).

llama : refactor get / set state + remove redundant kv cache API

a136a93

ggerganov force-pushed the refactor-state branch from 1a790a6 to a136a93 Compare April 23, 2023 15:59

ggerganov added the refactoring Refactoring label Apr 23, 2023

ggerganov requested a review from xaedes April 23, 2023 16:08

xaedes approved these changes Apr 24, 2023

View reviewed changes

ggerganov merged commit c4fe84f into master Apr 24, 2023

ggerganov deleted the refactor-state branch April 24, 2023 04:40

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback