Replies: 1 comment 4 replies
-
This PR (#11446) is not merged yet. The main reason for this is that the ambitious large-scale KV cache refactoring PR #11213 that would enable creation of custom KV cache implementations was not merged and instead only a limited subset of these changes was merged in #12181. There is some future work planned in #12181 (listed in the Next section), but the timeline for this is currently unknown. Perhaps @ggerganov will be able to give some estimate. However, there is a llama.cpp fork that already has #11446 PR (and many other changes that speed up DeepSeek R1/V3 inference) merged, you can try it if you want: https://github.com/ikawrakow/ik_llama.cpp |
Beta Was this translation helpful? Give feedback.
-
It seems like the MLA related PRs are not merged, so it is not supported yet? If it is supported, from which release does it support?
Beta Was this translation helpful? Give feedback.
All reactions