Oops, fix gradient accumulation, thanks @vfdev-5

https://pytorch.org/ignite/faq.html#gradients-accumulation
lopuhin · Oct 17, 2019 · a4c19ea · a4c19ea
1 parent 917e8f7
commit a4c19ea
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 4 deletions.
diff --git a/README.rst b/README.rst
@@ -140,7 +140,7 @@ Some details:
 * another trick for reducing memory usage and making it train faster with
   cudnn.benchmark was limiting and bucketing number of targets in one batch.
 * model was very sensitive to hyperparameters such as crop size and shape
-  and batch size (and gradient accumulation wasn't enough to fix this).
+  and batch size (and I had a bug in gradient accumulation).
 * SGD with momentum performed significantly better than Adam, cosine schedule
   was used, weight decay was also quite important.
 * quite large scale and color augmentations were used: hue/saturation/value,

diff --git a/kuzushiji/classify/main.py b/kuzushiji/classify/main.py
@@ -542,9 +542,6 @@ def create_supervised_trainer(
     def update_fn(engine, batch):
         model.train()
 
-        if engine.state.iteration % accumulation_steps == 0:
-            optimizer.zero_grad()
-
         x, y = prepare_batch(batch, device=device, non_blocking=non_blocking)
         y_pred = model(x)
         loss = loss_fn(y_pred, y)
@@ -557,6 +554,7 @@ def update_fn(engine, batch):
 
         if engine.state.iteration % accumulation_steps == 0:
             optimizer.step()
+            optimizer.zero_grad()
 
         return output_transform(x, y, y_pred, loss)