-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Have any plans to optimize the prefill kernel for the Hopper architecture? #521
Comments
Hi @alexngng , yes for sure. I still have some slight bug to fix and it's coming soon :) |
1 task
Really looking forward to it! |
@yzh119 Hello, any update? |
@alexngng @taegeonum @jason-huang03 |
@yzh119 Do you have a plan for supporting FP8 Q,K,V? |
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
I notice that the Flashinfer prefill kernel is much slower than FA3 and TRT-LLM FMHA on SM90.
Do you have any plans to use some SM90 features for optimization?
Here is some data I tested on an SM90. Single H20 GPU, Llama2 7B.
The text was updated successfully, but these errors were encountered: