-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Auto Parallel] Add zero h1 pipeline scheduling for paddle #62865
Conversation
… reconstruct_pipeline_scheduler_pass
… reconstruct_pipeline_scheduler_pass
你的PR提交成功,感谢你对开源项目的贡献! |
Sorry to inform you that 4911d03's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
name = name.split("@")[0] | ||
if not block._find_var_recursive(name): | ||
return "backward_b" | ||
var = block._find_var_recursive(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deal the operators without output
such as send_v2
, c_sync_calc_stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, 麻烦研发老师有空的时候再测试一下是否还会 hang 住 ~
def _partial_programs(self, program): | ||
dist_context = self.get_attr("dist_context") | ||
self._split_matmul_grad_ops_to_matmul(program, dist_context) | ||
types, sub_program_list = _program_for_zero_bubble(program) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考1F1B
或FTheB
的_partial_programs
,这里增加enable_send_recv_overlap
参数设置,例如1F1B_partial_programs
:
def _partial_programs(self, program):
# NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs.
enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap")
types = [FORWARD, BACKWARD, OPT]
sub_program_list = _program_for_fthenb_and_1f1b(
program, enable_send_recv_overlap
)
return types, sub_program_list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考
1F1B
或FTheB
的_partial_programs
,这里增加enable_send_recv_overlap
参数设置,例如1F1B_partial_programs
:def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list
这个要不单独加一个pr 适配一下吧,之前 vpp 的这个开关也是后续适配的 ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考
1F1B
或FTheB
的_partial_programs
,这里增加enable_send_recv_overlap
参数设置,例如1F1B_partial_programs
:def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list
这个要不单独加一个pr 适配一下吧,之前 vpp 的这个开关也是后续适配的 ~
VPP应该是一开始忘记加了,所以后续单独加上。这里可以一并加上。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
参考
1F1B
或FTheB
的_partial_programs
,这里增加enable_send_recv_overlap
参数设置,例如1F1B_partial_programs
:def _partial_programs(self, program): # NOTE: The flag "enable_send_recv_overlap" may increase the reserved memory of GPUs. enable_send_recv_overlap = self.get_attr("enable_send_recv_overlap") types = [FORWARD, BACKWARD, OPT] sub_program_list = _program_for_fthenb_and_1f1b( program, enable_send_recv_overlap ) return types, sub_program_list
这个要不单独加一个pr 适配一下吧,之前 vpp 的这个开关也是后续适配的 ~
VPP应该是一开始忘记加了,所以后续单独加上。这里可以一并加上。
好的 ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
Auto Parallel
PR Types
Others
Description
为 Paddle 支持 Zero-H1 并行调度
Llama2 4卡实际调度结果如下:
在 PaddleNLP Llama2 模型上进行测试结果如下(pp4, batch 1, hidden_layer=4):
精度
精度可以对齐,有时候小数点后3位以后会有误查(符合论文的描述)
Llama2 下 10000 步 Loss 对比:
以下为前10000步,loss 曲线图
速度测试
测试机器: 4卡 3090
显存占用
测试脚本如下:
相关 Issue: