Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Potential improvement to the gradient accumulation code #13

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

igor-sikachyna
Copy link

@igor-sikachyna igor-sikachyna commented Aug 6, 2020

I recently became really involved in experimenting with StyleGAN2 and stumbled upon a problem that I feel like the way the gradient accumulation is implemented for small GPU batches is incorrect. The piece of code I am concerned is training/training_loop.py:

# Slow path with gradient accumulation.
else:
    for _round in rounds:
        tflib.run(G_train_op, feed_dict)
    if run_G_reg:
        for _round in rounds:
            tflib.run(G_reg_op, feed_dict)
    tflib.run(Gs_update_op, feed_dict)
    for _round in rounds:
        tflib.run(data_fetch_op, feed_dict)
        tflib.run(D_train_op, feed_dict)
    if run_D_reg:
        for _round in rounds:
            tflib.run(D_reg_op, feed_dict)

As a reference, here is a code without gradient accumulation:

tflib.run([G_train_op, data_fetch_op], feed_dict)
if run_G_reg:
    tflib.run(G_reg_op, feed_dict)
tflib.run([D_train_op, Gs_update_op], feed_dict)
if run_D_reg:
    tflib.run(D_reg_op, feed_dict)

So as for gradient accumulation case:

  1. G_train_op is repeated multiple times on the same data instead of taking new samples with data_fetch_op
  2. G_reg_op uses the same data as G_train_op (while looking at the code without gradient accumulation they call data_fetch_op between them)
  3. D_train_op has new data_fetch_op for each round which suggests that it should be the same for G_train_op
  4. The PR Potential bug in gradient accumulation? #9 suggests that D_reg_op is also misused as it requires new data via data_fetch_op

So I propose a simple update to gradient accumulation code code and ask for opinion on whether there are real issue with it in the first place?

@Thunder003
Copy link

Thunder003 commented Feb 25, 2021

Hi @igor-sikachyna , do you got the solution to the question you raised? Also, did you take a look at the optimizer file here, at line 228? They haven't added tensor for this in the graph. Then how gradients will be updated when tflib.run(G_train_op, feed_dict) will execute? Any idea on this?

@johndpope
Copy link

also check stylegan2-ada this was more recent cut - 2020.
https://github.com/NVlabs/stylegan2-ada
now superceded by
https://github.com/NVlabs/stylegan2-ada-pytorch

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants