support split qkv linear and sp overlap comm #415

inkcherry · 2024-07-05T09:00:07Z

work with deepspeedai/DeepSpeed#5691
when use ds_sequence_parallel, open the following 2 flags to enable overlap comm.
--split-qkv-linear
--ds-sequence-parallel-overlap-comm

SP is a fantastic piece of work, it is very elegant and concise， at the current stage, a transformer layer's forward and backward passes involve 8 all-to-all operations, with 5 opportunities for overlapping communication: Forward pass: The QKV matrix operations can be pipelined alongside some of the all-to-all communications. Backward pass: DQ, DK, DV all-to-all communications can be pipelined alongside matrix operations. Backward pass: DO_w can be parallel with DO_input, involving matrix operations and all-to-all communications. Similar overlap-comm strategies are used in Megatron for TP/TP-sp parallelism. I tested under conditions of 1N8C zero1, disabled activation checkpointing, ds-sp=8, and gbs=16: 1B 64K 7B 16K They showed over 10% improvement (where I found that for mega-ds, using split QKV itself can also enhance performance due to reducing slice + cat operations in fwd/bwd), despite some TFLOPs already performing at a relatively good level. co-work with deepspeedai/Megatron-DeepSpeed#415 --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Heyang Qin <heyangqin@microsoft.com>

delock · 2024-08-29T06:48:47Z

deepspeedai/DeepSpeed#5691 is merged. @inkcherry do you still need this PR be reviewed? Can you resolve conflict on this branch?

inkcherry · 2024-08-30T04:48:46Z

@tohtana , @loadams notice deepspeedai/DeepSpeed#5691 is merged, could you merge this one ? thanks!

yingtongxiong · 2024-11-05T06:23:12Z

Hello，When I run the pretrain_gpt.py，I met the following bugs,

@inkcherry

inkcherry · 2024-11-05T13:45:10Z

@yingtongxiong If using this branch,
could you try to update the DeepSpeed version (to 8.30 or later) and enable flash-v2,disable activation_checkpoint to test it out refer(https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples_deepspeed/sequence_parallel/ds_pretrain_gpt_1.3B_seq_parallel_32k.sh)?
Would this issue occur if we don’t enable two overlap options?

loadams · 2024-11-07T18:19:32Z

Hi @inkcherry - could you take a look at resolving the merge conflicts on this?

inkcherry · 2024-11-14T13:46:19Z

Hi @inkcherry - could you take a look at resolving the merge conflicts on this?

Hi, @loadams ,
I resolved the conflict and noticed that in the latest version of DeepSpeed, a view operation was missing in some updates compared to the original version https://github.com/microsoft/DeepSpeed/blob/17ed7c77c58611a923a6c8d2a3d21d359cd046e8/deepspeed/sequence/layer.py#L56 , which caused the issue. I added it back deepspeedai/DeepSpeed#6750 and validated it with a loss check.

Currently master mds + master ds (197~200 steps):

lm loss: 8.855590E+00
lm loss: 8.892502E+00
lm loss: 8.766361E+00
lm loss: 8.618977E+00

this branch + ds fix patch + enable overlap(197~200 steps):

lm loss: 8.855516E+00
lm loss: 8.890095E+00 
lm loss: 8.765872E+00
lm loss: 8.620874E+00

yingtongxiong · 2024-11-25T07:30:35Z

Hello，When I run the pretrain_gpt.py，I met the following bugs, @inkcherry

hello, and now I met this problem, the run python file is the pretrain_gpt.py

yingtongxiong · 2024-11-25T08:27:09Z

@yingtongxiong If using this branch, could you try to update the DeepSpeed version (to 8.30 or later) and enable flash-v2,disable activation_checkpoint to test it out refer(https://github.com/microsoft/Megatron-DeepSpeed/blob/main/examples_deepspeed/sequence_parallel/ds_pretrain_gpt_1.3B_seq_parallel_32k.sh)? Would this issue occur if we don’t enable two overlap options?

I can run this shell (where I enable flash-v2 and disable activation-checkpoint) if I don't enable two overlap options.

inkcherry · 2024-11-25T10:42:56Z

@yingtongxiong
Yes, please run together with this fix. deepspeedai/DeepSpeed#6750

inkcherry added 9 commits June 5, 2024 06:58

fix rope precision for long context

8bda975

enable o_compute aynsc

2e69e22

enable qk_bwd_ayncall2all

f78b8f5

fwd optim

0be1080

fix arg

bdc9f91

use current_stream

e9f8e99

Merge remote-tracking branch 'my_mega/fwd_optimim'

b35b4d0

split qkv + sp overlap comm

231d2e9

revert rope change

35d6d54

inkcherry requested review from tjruwase, awan-10, eltonzheng, duli2012, arashb, xiaoxiawu-microsoft and GuanhuaWang as code owners July 5, 2024 09:00

tjruwase requested review from samadejacobs and removed request for arashb, duli2012, tjruwase, awan-10, GuanhuaWang, eltonzheng and xiaoxiawu-microsoft July 8, 2024 19:51

inkcherry mentioned this pull request Jul 10, 2024

sequence parallel with communication overlap deepspeedai/DeepSpeed#5691

Merged

Merge branch 'main' into split_qkv_overlap_comm

191fbf7

fix merge

08fedf4

delock mentioned this pull request Sep 20, 2024

[TRACKER] Customer support related PR tracker for Intel devices #446

Open

12 tasks

inkcherry added 2 commits November 14, 2024 08:09

Merge branch 'origin' into HEAD

62baaec

merge branch

8516213

Merge remote-tracking branch 'origin/main' into split_qkv_overlap_comm

ec194e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support split qkv linear and sp overlap comm #415

support split qkv linear and sp overlap comm #415

inkcherry commented Jul 5, 2024 •

edited

Loading

delock commented Aug 29, 2024

inkcherry commented Aug 30, 2024

yingtongxiong commented Nov 5, 2024

inkcherry commented Nov 5, 2024 •

edited

Loading

loadams commented Nov 7, 2024

inkcherry commented Nov 14, 2024 •

edited

Loading

yingtongxiong commented Nov 25, 2024 •

edited

Loading

yingtongxiong commented Nov 25, 2024

inkcherry commented Nov 25, 2024

support split qkv linear and sp overlap comm #415

Are you sure you want to change the base?

support split qkv linear and sp overlap comm #415

Conversation

inkcherry commented Jul 5, 2024 • edited Loading

delock commented Aug 29, 2024

inkcherry commented Aug 30, 2024

yingtongxiong commented Nov 5, 2024

inkcherry commented Nov 5, 2024 • edited Loading

loadams commented Nov 7, 2024

inkcherry commented Nov 14, 2024 • edited Loading

yingtongxiong commented Nov 25, 2024 • edited Loading

yingtongxiong commented Nov 25, 2024

inkcherry commented Nov 25, 2024

inkcherry commented Jul 5, 2024 •

edited

Loading

inkcherry commented Nov 5, 2024 •

edited

Loading

inkcherry commented Nov 14, 2024 •

edited

Loading

yingtongxiong commented Nov 25, 2024 •

edited

Loading