Skip to content

Commit d1f563a

Browse files
committed
llama : fix Metal KV cache sync (close #1695)
1 parent 827f5ed commit d1f563a

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

llama.cpp

+8
Original file line numberDiff line numberDiff line change
@@ -1455,6 +1455,14 @@ static bool llama_eval_internal(
14551455
// When we implement Matrix x Matrix Metal multiplication, we can avoid this branch.
14561456
// But for now, we have focused only on Matrix x Vector Metal multiplication.
14571457
//
1458+
// TODO: avoid these syncs via shared memory (ref #1696)
1459+
//
1460+
if (lctx.ctx_metal) {
1461+
// We need to sync the GPU KV cache with the CPU KV cache
1462+
ggml_metal_get_tensor(lctx.ctx_metal, kv_self.k);
1463+
ggml_metal_get_tensor(lctx.ctx_metal, kv_self.v);
1464+
}
1465+
14581466
ggml_graph_compute(ctx0, &gf);
14591467

14601468
if (lctx.ctx_metal) {

0 commit comments

Comments
 (0)