Unclear wording in "Composing Optimizers" section of docs #1627

StevenWhitaker · 2021-06-23T21:17:30Z

In the Composing Optimizers section of the docs it states:

opt = Optimiser(ExpDecay(0.001, 0.1, 1000, 1e-4), Descent())
Here we apply exponential decay to the Descent optimiser.

I think that last sentence is a bit misleading to someone who is not very familiar with Flux. When I read "apply exponential decay to the Descent optimiser", I take that to mean that we use the learning rate defined by Descent, but update it according to some schedule defined by ExpDecay. But in reality, if I understand correctly, the above example from the docs uses an initial learning rate that is actually 0.0001 (because the default learning rate of Descent is 0.1), not 0.1 like I would expect.

(Somewhat tangential note: One possible reason for the confusion might be the fact that ExpDecay by itself can be used to do gradient descent, but that isn't conveyed by the name ExpDecay. I would think that something called ExpDecay would have to be paired with an optimizer, and would not have its own learning rate.)

Two possible ways to improve the clarity of the docs:

Update the example with
```
opt = Optimiser(ExpDecay(1, 0.1, 1000, 1e-4), Descent())
```
and leave the rest of the wording as is.
Use a different example, especially since
```
opt = Optimiser(ExpDecay(0.001, 0.1, 1000, 1e-4), Descent())
```
is equivalent to
```
opt = ExpDecay(0.0001, 0.1, 1000, 1e-4)
```
(if I understand correctly), so the current example doesn't seem as useful as another might be.

I am happy to open a PR to make the simple change suggested in 1., but I don't know Flux well enough to do 2.

The text was updated successfully, but these errors were encountered:

darsnack · 2021-06-23T21:34:19Z

Short term fix: perhaps the default value for the LR in ExpDecay should be 1 so it composes sensibly with other optimizers.

Long term rant:

One possible reason for the confusion might be the fact that ExpDecay by itself can be used to do gradient descent, but that isn't conveyed by the name ExpDecay. I would think that something called ExpDecay would have to be paired with an optimizer, and would not have its own learning rate.

Thank you for taking the effort to type this up, because what you are highlighting here is something that I've brought up before. Schedules and optimizers are not the same thing. It's a cute trick that certain schedules can be "composed" with our optimizers, but I really strongly feel we should fix this by making schedules their own distinct thing.

Schedulers (in general) wrap an optimizer, they do not compose with them. The fact that ExpDecay can currently be used like an optimizer when it isn't one just underscores this point so well.

DhairyaLGandhi · 2021-06-24T11:12:17Z

1 should be fine for now

1628: Update "Composing Optimisers" docs r=darsnack a=StevenWhitaker Addresses #1627 (perhaps only partially). Use `1` instead of `0.001` for the first argument of `ExpDecay` in the example, so that the sentence following the example, i.e., > Here we apply exponential decay to the `Descent` optimiser. makes more sense. It was also [suggested](#1627 (comment)) in the linked issue that it might be worth changing the default learning rate of `ExpDecay` to `1`. Since this PR doesn't address that, I'm not sure merging this PR should necessarily close the issue. Co-authored-by: StevenWhitaker <steventwhitaker@gmail.com>

StevenWhitaker mentioned this issue Jun 24, 2021

Update "Composing Optimisers" docs #1628

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear wording in "Composing Optimizers" section of docs #1627

Unclear wording in "Composing Optimizers" section of docs #1627

StevenWhitaker commented Jun 23, 2021

darsnack commented Jun 23, 2021

DhairyaLGandhi commented Jun 24, 2021

Unclear wording in "Composing Optimizers" section of docs #1627

Unclear wording in "Composing Optimizers" section of docs #1627

Comments

StevenWhitaker commented Jun 23, 2021

darsnack commented Jun 23, 2021

DhairyaLGandhi commented Jun 24, 2021