We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Expected release date: Feb 28th, 2024
head_dim=256
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Expected release date: Feb 28th, 2024
[ ] faster batch prefill/append attention with kv partition for small query length[WIP][Feature] Support KV Partition for BatchPrefill kernel for Paged & Ragged KV-Cache. #75[ ] faster fused-rope gqa(Doesn't seem to work well, it's encouraged to use prefill kernels instead).[ ] Python interface for 4/8bit kernelsHow to use low-bit KV Cache in flashinfer? #125head_dim=256
for attention kernels #132[ ] More versatile group sizes[Feature Request] More versatile GQA group sizes #140The text was updated successfully, but these errors were encountered: