You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to get STA working with Wan2.1 which requires digging into the code, and I am wondering why the STA kernel only supports a sequence length of 115456 with text and 82994 without? I was looking at the kernel to try figure out why but it is not immediately obvious. If I comment out these assertions and run with difference sequence lengths, the kernel still runs but seems less accurate WRT the flex_attention baseline.
The text was updated successfully, but these errors were encountered:
Hi, first of all thank you for the awesome work!!
I have been trying to get STA working with Wan2.1 which requires digging into the code, and I am wondering why the STA kernel only supports a sequence length of 115456 with text and 82994 without? I was looking at the kernel to try figure out why but it is not immediately obvious. If I comment out these assertions and run with difference sequence lengths, the kernel still runs but seems less accurate WRT the
flex_attention
baseline.The text was updated successfully, but these errors were encountered: