Have any plans to optimize the prefill kernel for the Hopper architecture? #521

alexngng · 2024-10-10T03:37:35Z

I notice that the Flashinfer prefill kernel is much slower than FA3 and TRT-LLM FMHA on SM90.
Do you have any plans to use some SM90 features for optimization?

Here is some data I tested on an SM90. Single H20 GPU, Llama2 7B.

Tokens Number	TRT-LLM FMHA	FA3	Flashinfer
512 x 1	37638.6	39,334.6	74966.6
512 x 2	54729.9	61,680.4	114800.0
512 x 4	103388.8	113,056.2	190688.4

yzh119 · 2024-10-10T04:57:10Z

Hi @alexngng , yes for sure. I still have some slight bug to fix and it's coming soon :)

jason-huang03 · 2024-10-28T07:59:16Z

Really looking forward to it!

taegeonum · 2024-12-06T07:25:14Z

@yzh119 Hello, any update?

yzh119 · 2024-12-16T13:03:35Z

@alexngng @taegeonum @jason-huang03
Done in #667 .

taegeonum · 2024-12-26T06:12:21Z

@yzh119 Do you have a plan for supporting FP8 Q,K,V?

jeejeelee mentioned this issue Oct 18, 2024

[Performance]: FLASHINFER backend is slower than FLASH_ATTN on H100 vllm-project/vllm#9471

Closed

1 task

zhyncs mentioned this issue Dec 16, 2024

perf: Dense and sparse customizable flashattention-3 template #667

Merged

zhyncs closed this as completed Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have any plans to optimize the prefill kernel for the Hopper architecture? #521

Have any plans to optimize the prefill kernel for the Hopper architecture? #521

alexngng commented Oct 10, 2024

yzh119 commented Oct 10, 2024

jason-huang03 commented Oct 28, 2024

taegeonum commented Dec 6, 2024

yzh119 commented Dec 16, 2024 •

edited

Loading

taegeonum commented Dec 26, 2024 •

edited

Loading

Have any plans to optimize the prefill kernel for the Hopper architecture? #521

Have any plans to optimize the prefill kernel for the Hopper architecture? #521

Comments

alexngng commented Oct 10, 2024

yzh119 commented Oct 10, 2024

jason-huang03 commented Oct 28, 2024

taegeonum commented Dec 6, 2024

yzh119 commented Dec 16, 2024 • edited Loading

taegeonum commented Dec 26, 2024 • edited Loading

yzh119 commented Dec 16, 2024 •

edited

Loading

taegeonum commented Dec 26, 2024 •

edited

Loading