feat: add trtllm moe_allreduce_fusion #1108

yyihuang · 2025-06-02T05:43:23Z

📌 Description

We try to add moe_all_reduce_fusion kernels from trt-llm.

🔍 Related Issues

We split this PR into multiple ones. #1061
And all_reduce_fusion will be the next.

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

yzh119

Please remove all usage of packed/unpacked data type and use vec_t instead.

include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh

…m-moear

…tllm-moear

## 📌 Description Update the create_ipc_buffer implementation. Add unit tests for create_ipc_buffer. ## 🔍 Related Issues To help debug #1108. ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes

…_moear

yyihuang · 2025-06-16T03:58:57Z

Next step: uncomment and complete the fused quantization. Maybe dependent on #1142

…m-moear

include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh

yzh119 · 2025-06-16T06:52:08Z

include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh

+                   "hidden_dim * sizeof(T) must be a multiple of kBytesPerAccess");
+  if (params.residual_out && not params.norm_out && params.quant_out) {
+    // pattern1: AR+Add_RMS+Quant
+    // [m, 7168] bf16 allreduce_in, [m, 7168] bf16 residual_in


Do we have shape check somewhere?

yzh119 · 2025-06-16T06:54:41Z

tests/test_trtllm_moe_allreduce_fusion.py

+                            torch.cuda.synchronize()
+
+                            # 6. Check correctness
+                            tolerance = 8e-2 if dtype == torch.float16 else 8e-1


8e-1 seems too large for me, can you give an example about the distribution of all_reduce_out?

tests/test_trtllm_moe_allreduce_fusion.py

yzh119 · 2025-06-16T16:30:11Z

include/flashinfer/comm/trtllm_moe_allreduce_fusion.cuh

+  // [m, d] bf16 allreduce_in, [m, d] bf16 residual_in
+  // [m, d] bf16 residual_out, [m, d] bf16 norm_out, [m, d] fp4 quant_out
+
+  if (params.allreduce_out && params.residual_out && !params.norm_out && params.quant_out) {


the remaining part can still be dispatched:

DISPATCH_MOEREDUCTION_KERNEL(T, params, launch_with_pdl, ar, res, rms, quant)

…into trtllm-moear

yzh119

I'm good with the PR, thanks so much for your contribution!

Please refer to
9c229c9 on how to simplify the macro.

Some naming conventions (in flashinfer we usually write both runtime variable and constexpr in the macro definition, to make it easier to developer to track what are the new constexpr introduced in the macro):

#define DISPATCH_*(var, CONST_EXPR)

and we capitalize the CONST_EXPR.

## 📌 Description Update the create_ipc_buffer implementation. Add unit tests for create_ipc_buffer. ## 🔍 Related Issues To help debug flashinfer-ai#1108. ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes

We try to add moe_all_reduce_fusion kernels from trt-llm. We split this PR into multiple ones. flashinfer-ai#1061 And all_reduce_fusion will be the next. Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.).  --------- Co-authored-by: Zihao Ye <expye@outlook.com> Address the review comments. Allow d_qk without paged attention Update the cubins a bit more (Need to update the sha) Update the commit sha Updated the cubin loading path - Address code review comments.

init

23664c9

yzh119 reviewed Jun 2, 2025

View reviewed changes

yyihuang and others added 21 commits June 3, 2025 03:23

resolve comments

cf2a3d5

add python if and tests

1828aad

include <cuda/std/optional>

13e9175

upd build

088c2b8

upd interface

cba6a76

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into trtll…

03f3a79

…m-moear

upd

e80d66e

upd

540b7af

upd

5cf046c

upd

06e6fd6

temp fix (add sm100a flag to all comm modules)

ccf4170

upd name to pybind

a84e186

Merge branch 'trtllm-moear' of github.com:yyihuang/flashinfer into tr…

18cae59

…tllm-moear

fix binding

310a93b

fix

8358eae

fix lamport init size (in elts)

95a2193

fix flag init

07ae76d

upd

89cb2d4

init debug

85496bb

add a sample log

862db2b

upd create ipc mem

56f9b9e

yyihuang mentioned this pull request Jun 9, 2025

feat: update and test create_ipc_buffer #1130

Merged

5 tasks

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into debug…

3ef50bb

…_moear

pavanimajety mentioned this pull request Jun 9, 2025

[RFC]: Blackwell Enablement for vLLM (SM100) vllm-project/vllm#18153

Open

21 tasks

yyihuang and others added 3 commits June 10, 2025 14:27

add workspace print

7bfb131

upd: fix type trait to binary cmp for neg_zero check

08028b0

add is_negative_zero

bd6ab04

yyihuang added 10 commits June 12, 2025 19:35

add dispatch

5e52017

draft closeness test

5101bf2

fix test input size

e52be94

upd debug

e1f68b0

upd test (partially passed)

7c3df7d

upd

267e3d6

upd debug

3d9cfdd

disable quant, add ar output, fix test

e0f25bd

upd tolerance

d884812

cleanup

67b3a40

yyihuang requested a review from yzh119 June 16, 2025 02:31

yyihuang added 2 commits June 15, 2025 22:38

upd tolerance for bf16

a594e00

cleanup, add quant todo

e6461c1

yyihuang added 2 commits June 15, 2025 23:59

Merge branch 'main' of github.com:flashinfer-ai/flashinfer into trtll…

a9d8a82

…m-moear

add quant workaround

6c54e09

yzh119 reviewed Jun 16, 2025

View reviewed changes

fix review, rm sf_out bound check

0355faf

yyihuang requested a review from yzh119 June 16, 2025 08:20

yzh119 reviewed Jun 16, 2025

View reviewed changes

yyihuang added 3 commits June 16, 2025 13:19

upd dispatch

a7f4856

fix fused_quant

988367c

rm distribution plot

a10d027

yyihuang requested a review from yzh119 June 17, 2025 03:39

yzh119 added 2 commits June 17, 2025 04:37

simplify

9c229c9

Merge branch 'trtllm-moear' of https://github.com/yyihuang/flashinfer …

1f83898

…into trtllm-moear

yzh119 approved these changes Jun 17, 2025

View reviewed changes

yzh119 merged commit 0a754ce into flashinfer-ai:main Jun 17, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add trtllm moe_allreduce_fusion #1108

feat: add trtllm moe_allreduce_fusion #1108

Uh oh!

yyihuang commented Jun 2, 2025 •

edited

Loading

Uh oh!

yzh119 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yyihuang commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yzh119 Jun 16, 2025

Uh oh!

yzh119 Jun 16, 2025

Uh oh!

Uh oh!

yzh119 Jun 16, 2025

Uh oh!

yzh119 left a comment

Uh oh!

Uh oh!

Uh oh!

feat: add trtllm moe_allreduce_fusion #1108

feat: add trtllm moe_allreduce_fusion #1108

Uh oh!

Conversation

yyihuang commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yyihuang commented Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yzh119 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

yzh119 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yzh119 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yyihuang commented Jun 2, 2025 •

edited

Loading