Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Feature]: Support for newer flash-attention versions (e.g. ≥2.1.0) #53

Open
JiahuaZhao opened this issue May 22, 2024 · 2 comments
Open

Comments

@JiahuaZhao
Copy link

Suggestion Description

When we need to do long context inference (using LongLoRA), sometimes errors occur: need flash-attn version ≥2.1.0. So wondering if a newer version will follow.

Operating System

SUSE

GPU

MI250X

ROCm Component

ROCm 5.4.3

@jinsong-mao
Copy link

+1, the default branch "flash_attention_for_rocm" has 272 commits behind TriDao's repo, a lot of API are not compatible with some frameworks, anyway to resolve this? any new branchs?

@turboderp
Copy link

I don't know what another +1 is worth, but catching up with specifically lower-right causal masking and paged attention would make a world of difference for ROCm users.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants