Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

add moe topk(k>2) gate support #5881

Merged
merged 8 commits into from
Aug 15, 2024
Merged

add moe topk(k>2) gate support #5881

merged 8 commits into from
Aug 15, 2024

Conversation

inkcherry
Copy link
Contributor

@inkcherry inkcherry commented Aug 8, 2024

Notice some users need to use topk > 2 to train MoE models. For example: https://huggingface.co/Qwen/Qwen2-57B-A14B/blob/main/config.json, this PR adds support for topk (k > 2) gates.

  • add topk (k>2) support
  • add drop token policy based on position and probabilities.
  • unit tests

inkcherry and others added 3 commits August 8, 2024 08:09
* [MoE] enable topk > 2 gate

* print_version

* refine code

* deepspeed/moe/sharded_moe.py

* func verify

* refine code

* refine code

* refine code

* refine code

* refine code

* remove duplicate topk

* update

* refine code

* fix format

* update

* fix ==

* update

* add ut

* rm tt

* update

* add top3 ut

* revert note

* remove -

---------

Co-authored-by: Kurt Chen <kurt.chen@intel.com>
Co-authored-by: Jin, Youzhi <youzhi.jin@intel.com>
@tjruwase tjruwase requested review from tohtana and removed request for awan-10 and loadams August 9, 2024 10:41
@tohtana tohtana enabled auto-merge August 15, 2024 16:08
@tohtana
Copy link
Contributor

tohtana commented Aug 15, 2024

Thank you @inkcherry for the great contribution! I approved and scheduled merging.

@tohtana tohtana added this pull request to the merge queue Aug 15, 2024
Merged via the queue into deepspeedai:master with commit 9a3ede7 Aug 15, 2024
11 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants