Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Feature] Generalize STA kernel to work for any sequence length #225

Open
alexarmbr opened this issue Feb 28, 2025 · 0 comments
Open

[Feature] Generalize STA kernel to work for any sequence length #225

alexarmbr opened this issue Feb 28, 2025 · 0 comments

Comments

@alexarmbr
Copy link

alexarmbr commented Feb 28, 2025

Hi, first of all thank you for the awesome work!!

I have been trying to get STA working with Wan2.1 which requires digging into the code, and I am wondering why the STA kernel only supports a sequence length of 115456 with text and 82994 without? I was looking at the kernel to try figure out why but it is not immediately obvious. If I comment out these assertions and run with difference sequence lengths, the kernel still runs but seems less accurate WRT the flex_attention baseline.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant