Skip to content

Commit

Permalink
Oops, fix gradient accumulation, thanks @vfdev-5
Browse files Browse the repository at this point in the history
  • Loading branch information
lopuhin committed Oct 17, 2019
1 parent 917e8f7 commit a4c19ea
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 4 deletions.
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ Some details:
* another trick for reducing memory usage and making it train faster with
cudnn.benchmark was limiting and bucketing number of targets in one batch.
* model was very sensitive to hyperparameters such as crop size and shape
and batch size (and gradient accumulation wasn't enough to fix this).
and batch size (and I had a bug in gradient accumulation).
* SGD with momentum performed significantly better than Adam, cosine schedule
was used, weight decay was also quite important.
* quite large scale and color augmentations were used: hue/saturation/value,
Expand Down
4 changes: 1 addition & 3 deletions kuzushiji/classify/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -542,9 +542,6 @@ def create_supervised_trainer(
def update_fn(engine, batch):
model.train()

if engine.state.iteration % accumulation_steps == 0:
optimizer.zero_grad()

x, y = prepare_batch(batch, device=device, non_blocking=non_blocking)
y_pred = model(x)
loss = loss_fn(y_pred, y)
Expand All @@ -557,6 +554,7 @@ def update_fn(engine, batch):

if engine.state.iteration % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()

return output_transform(x, y, y_pred, loss)

Expand Down

0 comments on commit a4c19ea

Please # to comment.