Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add deepseek autotp #6937

Merged
merged 1 commit into from
Jan 9, 2025
Merged

Add deepseek autotp #6937

merged 1 commit into from
Jan 9, 2025

Conversation

Yejing-Lai
Copy link
Contributor

Deepseek including Multi-Head Latent Attention(MLA) and MoE.

For MLA TP, we need to skip two low-rank layers("q_a_proj" and "kv_a_proj_with_mqa)
For Deepseek MoE, tp_parse gets this moe layer name is layer_idx.down_proj, it is hard to add the policy, so we set the down_proj layer to all_reduce_linears default.

@Yejing-Lai
Copy link
Contributor Author

@loadams @delock Please kindly review. Thanks~

@loadams loadams changed the title Add deepspeek autotp Add deepseek autotp Jan 9, 2025
@loadams loadams added this pull request to the merge queue Jan 9, 2025
Merged via the queue into microsoft:master with commit 45fce45 Jan 9, 2025
11 checks passed
@glowwormX
Copy link

Hello@Yejing-Lai .Can you tell me how to enable TP, how to use deepspeed to train deepseek, and how to modify modeling.py and deepspeed configurations?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants