Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Different zero stage the training memory compute #912

Open
Arcmoon-Hu opened this issue Jul 17, 2024 · 0 comments
Open

Different zero stage the training memory compute #912

Arcmoon-Hu opened this issue Jul 17, 2024 · 0 comments

Comments

@Arcmoon-Hu
Copy link

If I use zero stage 2 and enable gradient checkpoint then how compute the activation memory ?
There is a ref paper https://arxiv.org/pdf/2205.05198 :
image

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant