Skip to content

Parameterize prefix mask call (needed by POD-Attention) #1059

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 6 commits into from
May 14, 2025

Conversation

AKKamath
Copy link
Contributor

Passes the threadId to the prefix mask call manually. Shouldn't change existing code, but is necessary since POD remaps threadIds and blockIds.

@yzh119 yzh119 merged commit a9935ea into flashinfer-ai:main May 14, 2025
2 checks passed
@Edenzzzz
Copy link
Contributor

Edenzzzz commented May 17, 2025

I will try to add BatchedPrefill for POD in the meantime. It seems mostly about setting up the params and page indices in wrapper.plan and pod_with_kv_cache_tensor

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants