feat: expose pytorch api for block sparse attention #375

yzh119 · 2024-07-13T04:05:54Z

The block sparse attention (for any block size (R, C)) are hidden in flashinfer's codebase but it was never exposed explicitly in python. As requested in #367 , this PR implements the PyTorch APIs for block sparse attention, accordingly to our experiments, it can greatly accelerate attention computation with low density (10x for Tree Attention in Sequoia).

🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.9...v0.1.0) (2024-07-17) ### Features * Add mask to `merge_state_in_place` ([#372](#372)) ([e14fa81](e14fa81)) * expose pytorch api for block sparse attention ([#375](#375)) ([4bba6fa](4bba6fa)) * Fused GPU sampling kernel for joint top-k & top-p sampling ([#374](#374)) ([6e028eb](6e028eb)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

yzh119 added 6 commits July 13, 2024 03:45

i feel extremely tired

8418699

upd

28fb39f

upd

ffa89b8

finish tests

8471e46

upd

2969495

upd

844035b

yzh119 merged commit 4bba6fa into main Jul 17, 2024

github-actions bot mentioned this pull request Jul 13, 2024

chore(main): release 0.1.0 #373

Merged

yzh119 deleted the block-sparse branch July 24, 2024 10:38

github-actions bot mentioned this pull request Jul 31, 2024

chore(main): release 0.1.4 #415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose pytorch api for block sparse attention #375

feat: expose pytorch api for block sparse attention #375

yzh119 commented Jul 13, 2024

feat: expose pytorch api for block sparse attention #375

feat: expose pytorch api for block sparse attention #375

Conversation

yzh119 commented Jul 13, 2024