Skip to content

Commit

Permalink
Minor fix to make adafactor work for >2d conv kernels (#1122)
Browse files Browse the repository at this point in the history
Summary:
missing .unsqueeze(-1) in line 124,
without this change we'll encounter runtime error for >2d convolutional kernels, with this fix, we're applying adafactor's 2d logic to the two final dimensions.
Pull Request resolved: #1122

Differential Revision: D17431662

Pulled By: myleott

fbshipit-source-id: e7435e77270a9252f75f01b2457ef0048f5bcf36
  • Loading branch information
akhileshgotmare authored and facebook-github-bot committed Sep 18, 2019
1 parent 718677e commit 8dbee4a
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion fairseq/optim/adafactor.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ def _rms(self, tensor):
return tensor.norm(2) / (tensor.numel() ** 0.5)

def _approx_sq_grad(self, exp_avg_sq_row, exp_avg_sq_col, output):
r_factor = (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1)).rsqrt_().unsqueeze(-1)
r_factor = (exp_avg_sq_row / exp_avg_sq_row.mean(dim=-1).unsqueeze(-1)).rsqrt_().unsqueeze(-1)
c_factor = exp_avg_sq_col.unsqueeze(-2).rsqrt()
torch.mul(r_factor, c_factor, out=output)

Expand Down

0 comments on commit 8dbee4a

Please # to comment.