Enabled Qwen2-MoE Tensor Parallelism (TP) inference #6551

gyou2021 · 2024-09-18T10:15:59Z

Modified _replace_module in auto_tp.py :
The modification keeps the layers 'shared_expert_gate' and 'gate' in qwen2-moe the original type torch.nn.Linear and not changes them into LinearLayer. In this way, their weights will not be split into multiple HPU/GPU cards. Then the qwen2-moe can run on multiple HPU/GPU cards.
Since the weights of 'gate' are not split into multiple HPU/GPU cards, all gather operations are not needed, which may improve performance.

delock · 2024-09-19T02:23:37Z

Hi @Yejing-Lai , do you want to provide some comments on this PR for Qwen2-MoE AutoTP support?

Yejing-Lai · 2024-09-19T14:02:54Z

Could you try to modify this line if it can meet your needs? https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/auto_tp.py#L336

gyou2021 · 2024-09-25T15:16:20Z

Could you try to modify this line if it can meet your needs? https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/auto_tp.py#L336

Yes. It can provide the same function and result if probably coded.

gyou2021 · 2024-09-26T09:26:34Z

Could you try to modify this line if it can meet your needs? https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/auto_tp.py#L336

Thank you for your comments.
I just moved the linear filter of qwen2-moe from _replace_module() to _replace() for uniform code management. Both have the same function and the same result. The qwen2-moe branch has been updated.

…() for uniform code management. Both have the same function and the same result.

delock · 2024-09-27T05:52:22Z

Hi @gyou2021 , can you also add qwen2-moe to this list? Some user will check this page for AutoTP supported model.
https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/automatic-tensor-parallelism.md#supported-models

gyou2021 · 2024-09-27T06:20:15Z

Hi @gyou2021 , can you also add qwen2-moe to this list? Some user will check this page for AutoTP supported model. https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/automatic-tensor-parallelism.md#supported-models

Added. Thank you for your comment.

delock · 2024-09-29T03:02:39Z

Hi @tjruwase This PR adds AutoTP support for Qwen2-MoE. @Yejing-Lai and me had reviewed this change. Thanks!

gyou2021 requested review from awan-10 and arashb as code owners September 18, 2024 10:15

Enabled Qwen2-MoE Tensor Parallism (TP) inference

08f728d

delock mentioned this pull request Sep 20, 2024

[TRACKER] Customer support related PR tracker for Intel devices #6556

Open

23 tasks

Merge branch 'master' into qwen2-moe

7cff123

Changed linear filter of qwen2-moe from _replace_module() to _replace…

97f22ff

…() for uniform code management. Both have the same function and the same result.

Added Qwen2-MoE to the model list of auto_tp

deebfa0

loadams approved these changes Oct 8, 2024

View reviewed changes

Merge branch 'master' into qwen2-moe

932d4b2

loadams requested a review from tjruwase as a code owner October 8, 2024 23:06

loadams added this pull request to the merge queue Oct 9, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 9, 2024

loadams added this pull request to the merge queue Oct 9, 2024

loadams removed this pull request from the merge queue due to a manual request Oct 9, 2024

loadams added this pull request to the merge queue Oct 9, 2024

Merged via the queue into deepspeedai:master with commit 474a328 Oct 9, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabled Qwen2-MoE Tensor Parallelism (TP) inference #6551

Enabled Qwen2-MoE Tensor Parallelism (TP) inference #6551

gyou2021 commented Sep 18, 2024

delock commented Sep 19, 2024

Yejing-Lai commented Sep 19, 2024

gyou2021 commented Sep 25, 2024

gyou2021 commented Sep 26, 2024

delock commented Sep 27, 2024

gyou2021 commented Sep 27, 2024

delock commented Sep 29, 2024

Enabled Qwen2-MoE Tensor Parallelism (TP) inference #6551

Enabled Qwen2-MoE Tensor Parallelism (TP) inference #6551

Conversation

gyou2021 commented Sep 18, 2024

delock commented Sep 19, 2024

Yejing-Lai commented Sep 19, 2024

gyou2021 commented Sep 25, 2024

gyou2021 commented Sep 26, 2024

delock commented Sep 27, 2024

gyou2021 commented Sep 27, 2024

delock commented Sep 29, 2024