[Auto Parallel] Add zero h1 pipeline scheduling for paddle #62865

AndSonder · 2024-03-19T15:06:54Z

PR Category

Auto Parallel

PR Types

Others

Description

为 Paddle 支持 Zero-H1 并行调度

Llama2 4卡实际调度结果如下：

在 PaddleNLP Llama2 模型上进行测试结果如下（pp4, batch 1, hidden_layer=4）：

精度

精度可以对齐，有时候小数点后3位以后会有误查（符合论文的描述）

Llama2 下 10000 步 Loss 对比：

ZBH1: 2.6
1F1B: 2.6

以下为前10000步，loss 曲线图

速度测试

测试机器： 4卡 3090

调度方案	interval_runtime	interval_samples_per_second	interval_steps_per_second
1F1B	3.17	5.1	0.3
ZBH1	2.75	5.8	0.4

显存占用

调度方案	卡号	max_memory_allocated	max_memory_reserved
1F1B	0	12605.69 MB	13405.76 MB
1F1B	1	8809.68 MB	9611.76 MB
1F1B	2	7013.66 MB	7785.76 MB
1F1B	3	7806.72 MB	8561.76 MB
ZBH1	0	12921.69 MB (↑ 316 )	13831.76 MB (↑ 426 )
ZBH1	1	9639.7 MB (↑ 830 )	10463.76 MB (↑ 852 )
ZBH1	2	8357.72 MB (↑ 1344 )	9149.76 MB (↑ 1364 )
ZBH1	3	10597.38 MB (↑ 1790 )	11219.76 MB (↑ 1658 )

1F1B 总 max_memory_allocated: 36035.75 MB
ZBH1 总 max_memory_allocated: 41516.49 MB
1F1B 总 max_memory_reserved: 35064.04 MB
ZBH1 总 max_memory_reserved: 44650.04 MB

测试脚本如下：

set -x
unset CUDA_VISIBLE_DEVICES

task_name="llama_auto_static_dp2sharding2mp2pp2_vpp2"
# rm -rf output/$task_name/  # ckpt is saved in 'output/''
rm -rf "output/$task_name""_log"

# export PARALLEL_CROSS_ENTROPY=true
export FLAGS_call_stack_level=4
export PYTHONPATH=../../../:$PYTHONPATH
export GLOG_v=0

python -u -m paddle.distributed.launch \
    --gpus "0,1,2,3" \
    --log_dir "output/$task_name""_log" \
    run_pretrain_auto_static.py \
    --model_type "llama" \
    --model_name_or_path "facebook/llama-7b" \
    --tokenizer_name_or_path "facebook/llama-7b" \
    --input_dir "../data" \
    --output_dir "output/$task_name" \
    --split 949,50,1 \
    --max_seq_length 2048 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --use_flash_attention 0 \
    --use_fused_rms_norm 0 \
    --fp16 0 \
    --fp16_opt_level "O2"  \
    --scale_loss 1024 \
    --pipeline_parallel_degree  4 \
    --tensor_parallel_degree 1 \
    --pipeline_schedule_mode "ZBH1" \
    --learning_rate 0.0001 \
    --min_learning_rate 0.00001 \
    --max_steps 20 \
    --save_steps 5000 \
    --weight_decay 0.01 \
    --warmup_ratio 0.01 \
    --max_grad_norm 1.0 \
    --logging_steps 1 \
    --dataloader_num_workers 1 \
    --eval_steps 1000 \
    --report_to "visualdl" \
    --disable_tqdm true \
    --continue_training 0 \
    --recompute 0 \
    --recompute_granularity full \
    --do_train \
    --do_eval \
    --device "gpu" \
    --data_impl "mmap" \
    --enable_auto_parallel 1 \
    --sharding_parallel_degree 1 \
    --sharding "stage1" \

相关 Issue:

为 Paddle 支持 Zero-Bubble 并行编排 #62666

… reconstruct_pipeline_scheduler_pass

…com/AndSonder/Paddle into fit_zero_h1

…addle into fit_zero_h1

… fit_zero_h1

…com/AndSonder/Paddle into fit_zero_h1

paddle-bot · 2024-03-19T15:06:59Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… fit_zero_h1

paddle-ci-bot · 2024-04-09T03:16:04Z

Sorry to inform you that 4911d03's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

heavyrain-lzy · 2024-04-15T11:33:33Z

python/paddle/distributed/passes/pass_utils.py

+        name = name.split("@")[0]
+        if not block._find_var_recursive(name):
+            return "backward_b"
+        var = block._find_var_recursive(name)


Deal the operators without output such as send_v2, c_sync_calc_stream

done, 麻烦研发老师有空的时候再测试一下是否还会 hang 住 ~

heavyrain-lzy · 2024-04-16T13:01:18Z

python/paddle/distributed/passes/pipeline_scheduler_pass/pipeline_zero_bubble.py

+    def _partial_programs(self, program):
+        dist_context = self.get_attr("dist_context")
+        self._split_matmul_grad_ops_to_matmul(program, dist_context)
+        types, sub_program_list = _program_for_zero_bubble(program)


参考1F1B或FTheB的_partial_programs,这里增加enable_send_recv_overlap参数设置，例如1F1B_partial_programs：

def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list

参考1F1B或FTheB的_partial_programs,这里增加enable_send_recv_overlap参数设置，例如1F1B_partial_programs：

def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list

这个要不单独加一个pr 适配一下吧，之前 vpp 的这个开关也是后续适配的 ~

参考1F1B或FTheB的_partial_programs,这里增加enable_send_recv_overlap参数设置，例如1F1B_partial_programs：

def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list

这个要不单独加一个pr 适配一下吧，之前 vpp 的这个开关也是后续适配的 ~

VPP应该是一开始忘记加了，所以后续单独加上。这里可以一并加上。

参考1F1B或FTheB的_partial_programs,这里增加enable_send_recv_overlap参数设置，例如1F1B_partial_programs：

def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list

这个要不单独加一个pr 适配一下吧，之前 vpp 的这个开关也是后续适配的 ~

VPP应该是一开始忘记加了，所以后续单独加上。这里可以一并加上。

好的 ~

heavyrain-lzy

LGTM

AndSonder and others added 21 commits March 12, 2024 05:47

reconstruct_pipeline_scheduler_pass

81fa843

add pipeline_scheduler_pass into __all__

09731e2

update __init__.py

63355db

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

c7483fc

… reconstruct_pipeline_scheduler_pass

recover __init__.py

9d9b0a7

extract split matmul_grad_op to pass_utils

e52301b

fix

b4ee57d

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

11d241d

… reconstruct_pipeline_scheduler_pass

add paddle.distributed.passes.pipeline_scheduler_pass' to setup.py

c87f313

add paddle.distributed.passes.pipeline_scheduler_pass' to setup.py.in

ed96dfc

apply suggestions from code review

617d248

update

ea00f08

fix

37e8ca0

change func name

8f4867e

Merge branch 'reconstruct_pipeline_scheduler_pass' of https://github.…

2962d82

…com/AndSonder/Paddle into fit_zero_h1

Merge branch 'split_matmul_grad_v2' of https://github.com/AndSonder/P…

001e40c

…addle into fit_zero_h1

update

0bd16fb

update

69a56b3

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

e7d7f05

… fit_zero_h1

Merge branch 'reconstruct_pipeline_scheduler_pass' of https://github.…

29d3699

…com/AndSonder/Paddle into fit_zero_h1

add zero bubble pipeline

6ba3ec5

paddle-bot bot added the contributor External developers label Mar 19, 2024

AndSonder mentioned this pull request Mar 19, 2024

为 Paddle 支持 Zero-Bubble 并行编排 #62666

Closed

9 tasks

AndSonder added 2 commits March 20, 2024 10:07

fix bug

fe26a06

fix

e49889f

AndSonder mentioned this pull request Mar 21, 2024

[WeeklyReports] 2024.03.09~2024.03.22 周报汇总 PFCCLab/Camp#161

Closed

28 tasks

update

3fb3235

AndSonder marked this pull request as ready for review March 21, 2024 13:23

fix error micro step id

4f17c40

AndSonder added 2 commits March 22, 2024 04:49

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

f405be1

… fit_zero_h1

add zero bubble unittest

92f31de

AndSonder changed the title ~~[AutoParallel] Add zero h1 for paddle~~ [AutoParallel] Add zero h1 pipeline scheduling for paddle Mar 26, 2024

AndSonder added 2 commits March 27, 2024 07:16

update comment

572a0b6

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

4911d03

… fit_zero_h1

AndSonder changed the title ~~[AutoParallel] Add zero h1 pipeline scheduling for paddle~~ [Auto Parallel] Add zero h1 pipeline scheduling for paddle Apr 2, 2024

AndSonder mentioned this pull request Apr 7, 2024

[WeeklyReports] 2024.03.23~2024.04.05 周报汇总 PFCCLab/Camp#193

Closed

28 tasks

AndSonder and others added 4 commits April 11, 2024 03:31

merge from dev

2c43ad4

Merge branch 'develop' into fit_zero_h1

7a6b150

add zb to __init__.py

6b70ad5

fix

6acb7f9

heavyrain-lzy reviewed Apr 15, 2024

View reviewed changes

AndSonder added 2 commits April 15, 2024 20:22

fix

b7f8abd

fix codestyle

8a4dad3

heavyrain-lzy reviewed Apr 16, 2024

View reviewed changes

AndSonder added 2 commits April 16, 2024 22:31

add enable_send_recv_overlap

5f0dd0c

fix

96b0895

heavyrain-lzy approved these changes Apr 17, 2024

View reviewed changes

XieYunshen approved these changes Apr 18, 2024

View reviewed changes

heavyrain-lzy merged commit adf8689 into PaddlePaddle:develop Apr 18, 2024
29 checks passed

AndSonder mentioned this pull request Jun 11, 2024

WAVE SUMMIT+2024上半年飞桨开源之星评选-信息征集 PaddlePaddle/community#892

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Auto Parallel] Add zero h1 pipeline scheduling for paddle #62865

[Auto Parallel] Add zero h1 pipeline scheduling for paddle #62865

AndSonder commented Mar 19, 2024 •

edited

Loading

paddle-bot bot commented Mar 19, 2024

paddle-ci-bot bot commented Apr 9, 2024

heavyrain-lzy Apr 15, 2024 •

edited

Loading

AndSonder Apr 15, 2024

heavyrain-lzy Apr 16, 2024

AndSonder Apr 16, 2024

heavyrain-lzy Apr 16, 2024

AndSonder Apr 16, 2024

heavyrain-lzy left a comment

[Auto Parallel] Add zero h1 pipeline scheduling for paddle #62865

[Auto Parallel] Add zero h1 pipeline scheduling for paddle #62865

Conversation

AndSonder commented Mar 19, 2024 • edited Loading

PR Category

PR Types

Description

paddle-bot bot commented Mar 19, 2024

paddle-ci-bot bot commented Apr 9, 2024

heavyrain-lzy Apr 15, 2024 • edited Loading

Choose a reason for hiding this comment

AndSonder Apr 15, 2024

Choose a reason for hiding this comment

heavyrain-lzy Apr 16, 2024

Choose a reason for hiding this comment

AndSonder Apr 16, 2024

Choose a reason for hiding this comment

heavyrain-lzy Apr 16, 2024

Choose a reason for hiding this comment

AndSonder Apr 16, 2024

Choose a reason for hiding this comment

heavyrain-lzy left a comment

Choose a reason for hiding this comment

AndSonder commented Mar 19, 2024 •

edited

Loading

heavyrain-lzy Apr 15, 2024 •

edited

Loading