-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[NewComm] No.10 compatiable upgrade for distributed_fused_lamb op #57424
[NewComm] No.10 compatiable upgrade for distributed_fused_lamb op #57424
Conversation
单测目前存在问题, |
换一个初始化方式。如果使用新通信库,使用 |
…tributed_fused_lamb
@@ -270,7 +270,10 @@ def setUpClass(cls): | |||
paddle.enable_static() | |||
paddle.set_flags({'FLAGS_cudnn_deterministic': True}) | |||
_clip_by_global_norm_using_mp_type(True) | |||
fleet.init(role_maker=get_role_maker()) | |||
if os.environ.get("FLAGS_dynamic_static_unified_comm") == "1": | |||
fleet.init(role_maker=get_role_maker()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
判断写反了吧,设置FLAGS_dynamic_static_unified_comm = 1
的时候,应该用paddle.distributed.collective._init_parallel_env("nccl")
的方式初始化。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -228,5 +231,19 @@ void NCCLCommContext::GroupStart() { | |||
} | |||
void NCCLCommContext::GroupEnd() { NCCL_CHECK(phi::dynload::ncclGroupEnd()); } | |||
|
|||
#if NCCL_VERSION_CODE >= 21100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里加一点注释信息吧,解释一下这个函数是干啥的,直接看名字很难弄懂功能。可以附上链接:https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/api/ops.html,把里面Op功能的解释,整点到注释里。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
4f2badd
to
8a29c88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ddlePaddle#57424) * [NewComm] No.10 compatiable upgrade for distributed_fused_lamb op * fix
…ddlePaddle#57424) * [NewComm] No.10 compatiable upgrade for distributed_fused_lamb op * fix
…ddlePaddle#57424) * [NewComm] No.10 compatiable upgrade for distributed_fused_lamb op * fix
…ddlePaddle#57424) * [NewComm] No.10 compatiable upgrade for distributed_fused_lamb op * fix
PR types
Others
PR changes
APIs
Description
compatiable upgrade for
distributed_fused_lamb
op#57102