feat: Add mask to `merge_state_in_place` #372

Yard1 · 2024-07-13T01:26:46Z

This pushes down the conditional logic to the kernel, allowing for better CUDA graph support with variable sequence length. I didn't see much purpose in adding the mask parameter to the out of place merge state kernels.

yzh119

LGTM, mask looks like a good feature to have, thank you @Yard1 !

🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.9...v0.1.0) (2024-07-17) ### Features * Add mask to `merge_state_in_place` ([#372](#372)) ([e14fa81](e14fa81)) * expose pytorch api for block sparse attention ([#375](#375)) ([4bba6fa](4bba6fa)) * Fused GPU sampling kernel for joint top-k & top-p sampling ([#374](#374)) ([6e028eb](6e028eb)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Add mask to merge_state_in_place

96c3bbb

yzh119 approved these changes Jul 13, 2024

View reviewed changes

yzh119 merged commit e14fa81 into flashinfer-ai:main Jul 13, 2024

github-actions bot mentioned this pull request Jul 13, 2024

chore(main): release 0.1.0 #373

Merged

Yard1 deleted the cascade_with_mask branch July 13, 2024 02:53

github-actions bot mentioned this pull request Jul 31, 2024

chore(main): release 0.1.4 #415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add mask to `merge_state_in_place` #372

feat: Add mask to `merge_state_in_place` #372

Yard1 commented Jul 13, 2024

yzh119 left a comment

feat: Add mask to merge_state_in_place #372

feat: Add mask to merge_state_in_place #372

Conversation

Yard1 commented Jul 13, 2024

yzh119 left a comment

Choose a reason for hiding this comment

feat: Add mask to `merge_state_in_place` #372

feat: Add mask to `merge_state_in_place` #372