Skip to content

Commit

Permalink
bugfix: Fix cudagraph mode of BatchPrefillWithRaggedKVCacheWrapper (#412
Browse files Browse the repository at this point in the history
)

The computation of `fixed_batch_size` is not correct.
  • Loading branch information
yzh119 authored Jul 30, 2024
1 parent 58d3593 commit 9907bc1
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions python/flashinfer/prefill.py
Original file line number Diff line number Diff line change
Expand Up @@ -1215,8 +1215,8 @@ def __init__(
raise ValueError(
"kv_indptr_buf should be a torch.Tensor in cuda graph mode"
)
self._fixed_batch_size = len(qo_indptr_buf)
if len(kv_indptr_buf) != self._fixed_batch_size:
self._fixed_batch_size = len(qo_indptr_buf) - 1
if len(kv_indptr_buf) != self._fixed_batch_size + 1:
raise ValueError(
"The length of kv_indptr_buf ({}) should be the same as qo_indptr_buf ({}).".format(
len(kv_indptr_buf), self._fixed_batch_size
Expand Down

0 comments on commit 9907bc1

Please # to comment.