Parameterize prefix mask call (needed by POD-Attention) #1059

AKKamath · 2025-05-14T20:24:21Z

Passes the threadId to the prefix mask call manually. Shouldn't change existing code, but is necessary since POD remaps threadIds and blockIds.

…PerMultiprocessor returns 0, so manually calculate the value instead.

benchmarks/bench_mixed_attention.py

Edenzzzz · 2025-05-17T03:09:58Z

I will try to add BatchedPrefill for POD in the meantime. It seems mostly about setting up the params and page indices in wrapper.plan and pod_with_kv_cache_tensor

AKKamath and others added 5 commits May 13, 2025 01:17

Fix KV chunking for POD. For some reason cudaOccupancyMaxActiveBlocks…

86b6fc7

…PerMultiprocessor returns 0, so manually calculate the value instead.

Run pre-commit and fix styling

dfc9af1

Update mixed bench to compare attention outputs.

2ff12c0

Parameterize prefix mask call (needed by POD-Attention).

de30c86

Merge branch 'flashinfer-ai:main' into pod_bugfix

7ca7804

AKKamath mentioned this pull request May 14, 2025

Fix KV chunking for POD. #1054

Merged

yzh119 reviewed May 14, 2025

View reviewed changes

benchmarks/bench_mixed_attention.py Outdated Show resolved Hide resolved

Edenzzzz reviewed May 14, 2025

View reviewed changes

benchmarks/bench_mixed_attention.py Outdated Show resolved Hide resolved

Minor tweaks to bench_mixed

b18a32c

yzh119 approved these changes May 14, 2025

View reviewed changes

yzh119 merged commit a9935ea into flashinfer-ai:main May 14, 2025
2 checks passed

magikRUKKOLA mentioned this pull request Jul 12, 2025

[Bug] deepseek v3/r1 with full context with balance_serve backend kvcache-ai/ktransformers#1417

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parameterize prefix mask call (needed by POD-Attention) #1059

Parameterize prefix mask call (needed by POD-Attention) #1059

Uh oh!

AKKamath commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edenzzzz commented May 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Parameterize prefix mask call (needed by POD-Attention) #1059

Parameterize prefix mask call (needed by POD-Attention) #1059

Uh oh!

Conversation

AKKamath commented May 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edenzzzz commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Edenzzzz commented May 17, 2025 •

edited

Loading