Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fused linear and selective recompute #620

Merged

Conversation

FeixLiu
Copy link
Contributor

@FeixLiu FeixLiu commented Aug 11, 2022

Part of pr #613, only contains the fused linear part and selective recompute part.

All below tests are carried out on 345M gpt model

Loss compare for pure data parallel training (dp = 8)

tensor_fusion + fused_linear + selective_recompute
image

这个性能有fused_linear

baseline speed optimize speed gain
181012 240203 +32.7%

Loss compare for hybrid parallel training (dp=mp=pp=2)

fused_linear + selective_recompute

image

baseline speed optimize speed gain
19072 21241 +11.4%

@@ -95,6 +95,7 @@ GPT训练默认使用AdamW优化器以及cosine 学习率衰减,这里通过
num_train_epochs: 1
seed: 1024
use_recompute: False
recompute_granularity:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个要填full吗?

Copy link
Contributor Author

@FeixLiu FeixLiu Aug 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不需要,这个recompute是false,空着就行,在backend收到的是一个None

examples/gpt/tools.py Outdated Show resolved Hide resolved
@ForFishes ForFishes merged commit 6c12050 into PaddlePaddle:develop Aug 11, 2022
@FeixLiu FeixLiu deleted the fused_linear_and_selective_recompute branch August 11, 2022 10:58
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants