New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

#

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Jump to bottom

[Roadmap] 0.0.3 Release Checklist #138

Closed

3 tasks done

yzh119 opened this issue Feb 27, 2024 · 0 comments

Collaborator

yzh119 commented Feb 27, 2024 •

edited

Loading

Expected release date: Feb 28th, 2024

python 3.8 wheels ci: py38 wheels #131
alibi attention bias Could you support AliBi attention bias? #137 feat: support ALiBi #146
~~[ ] faster batch prefill/append attention with kv partition for small query length~~ [WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75
~~[ ] faster fused-rope gqa~~ (Doesn't seem to work well, it's encouraged to use prefill kernels instead).
~~[ ] Python interface for 4/8bit kernels~~ How to use low-bit KV Cache in flashinfer? #125
256 head-dim Suppose Gemma model shape #130 feat: enable head_dim=256 for attention kernels #132
~~[ ] More versatile group sizes~~ [Feature Request] More versatile GQA group sizes #140

The text was updated successfully, but these errors were encountered:

yzh119 pinned this issue

yzh119 mentioned this issue

Could you support AliBi attention bias? #137

Closed

yzh119 closed this as completed

yzh119 unpinned this issue

yzh119 mentioned this issue

How to use low bit KV Cache #721

Open

# for free to join this conversation on GitHub. Already have an account? # to comment