reduce cpu host overhead when using moe #5578

ranzhejiang · 2024-05-29T04:01:30Z

The operation .to('cpu') is not necessary for exp_counts, and it will cause device to host synchronization which damage performance.

deepspeed/moe/sharded_moe.py

tohtana

@ranzhejiang Thank you for your contribution! I have a few questions about your changes. Can you clarify them?

deepspeed/moe/sharded_moe.py

ranzhejiang · 2024-06-11T15:00:19Z

Hi, @tohtana I have clarified the modifications you mentioned and retest this PR with Megatron-Deepspeed on GPU platform(8xA800). It runs well and loss remains consistent with the original method, Could you please help review it again? Thanks!

ranzhejiang · 2024-08-16T04:02:41Z

#5881 also adopts this plan to reduce cpu time

ranzhejiang requested a review from awan-10 as a code owner May 29, 2024 04:01

loadams requested a review from tohtana May 31, 2024 22:15

tohtana reviewed May 31, 2024

View reviewed changes

deepspeed/moe/sharded_moe.py Show resolved Hide resolved

tohtana reviewed May 31, 2024

View reviewed changes

deepspeed/moe/sharded_moe.py Show resolved Hide resolved

ranzhejiang force-pushed the zhejiang/reduce_host_overhead_moe branch from e9e32f4 to d860d2c Compare June 11, 2024 03:32

ranzhejiang force-pushed the zhejiang/reduce_host_overhead_moe branch from 686f511 to 23ec4a1 Compare August 16, 2024 03:58

reduce cpu host overhead when using moe

1cb0efd

ranzhejiang force-pushed the zhejiang/reduce_host_overhead_moe branch from 23ec4a1 to 1cb0efd Compare August 16, 2024 03:59

tjruwase added 2 commits August 20, 2024 22:20

Merge branch 'master' into zhejiang/reduce_host_overhead_moe

60d9de2

Merge branch 'master' into zhejiang/reduce_host_overhead_moe

a0669d9

tohtana approved these changes Aug 21, 2024

View reviewed changes

tohtana added this pull request to the merge queue Aug 21, 2024

Merged via the queue into deepspeedai:master with commit 7260890 Aug 21, 2024
11 checks passed

delock mentioned this pull request Sep 20, 2024

[TRACKER] Customer support related PR tracker for Intel devices #6556

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce cpu host overhead when using moe #5578

reduce cpu host overhead when using moe #5578

ranzhejiang commented May 29, 2024 •

edited

Loading

tohtana left a comment

ranzhejiang commented Jun 11, 2024

ranzhejiang commented Aug 16, 2024

reduce cpu host overhead when using moe #5578

reduce cpu host overhead when using moe #5578

Conversation

ranzhejiang commented May 29, 2024 • edited Loading

tohtana left a comment

Choose a reason for hiding this comment

ranzhejiang commented Jun 11, 2024

ranzhejiang commented Aug 16, 2024

ranzhejiang commented May 29, 2024 •

edited

Loading