Skip to content

Commit 1ec32c2

Browse files
authoredOct 10, 2022
Add @autosize (#2078)
* autosize, take 1 * fix outputsize on LayerNorm * tidy & improve * add tests, release note * rrule errors, improvements, tests * documentation * tweaks * add jldoctest; output = false * tweak * using Flux
1 parent dfc5a7e commit 1ec32c2

File tree

5 files changed

+318
-32
lines changed

5 files changed

+318
-32
lines changed
 

‎NEWS.md

+3
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Flux Release Notes
22

3+
## v0.13.7
4+
* Added [`@autosize` macro](https://github.com/FluxML/Flux.jl/pull/2078)
5+
36
## v0.13.4
47
* Added [`PairwiseFusion` layer](https://github.com/FluxML/Flux.jl/pull/1983)
58

‎docs/src/outputsize.md

+62-30
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,79 @@
11
# Shape Inference
22

3-
To help you generate models in an automated fashion, [`Flux.outputsize`](@ref) lets you
4-
calculate the size returned produced by layers for a given size input.
5-
This is especially useful for layers like [`Conv`](@ref).
3+
Flux has some tools to help generate models in an automated fashion, by inferring the size
4+
of arrays that layers will recieve, without doing any computation.
5+
This is especially useful for convolutional models, where the same [`Conv`](@ref) layer
6+
accepts any size of image, but the next layer may not.
67

7-
It works by passing a "dummy" array into the model that preserves size information without running any computation.
8-
`outputsize(f, inputsize)` works for all layers (including custom layers) out of the box.
9-
By default, `inputsize` expects the batch dimension,
10-
but you can exclude the batch size with `outputsize(f, inputsize; padbatch=true)` (assuming it to be one).
8+
The higher-level tool is a macro [`@autosize`](@ref) which acts on the code defining the layers,
9+
and replaces each appearance of `_` with the relevant size. This simple example returns a model
10+
with `Dense(845 => 10)` as the last layer:
1111

12-
Using this utility function lets you automate model building for various inputs like so:
1312
```julia
14-
"""
15-
make_model(width, height, inchannels, nclasses;
16-
layer_config = [16, 16, 32, 32, 64, 64])
13+
@autosize (28, 28, 1, 32) Chain(Conv((3, 3), _ => 5, relu, stride=2), Flux.flatten, Dense(_ => 10))
14+
```
15+
16+
The input size may be provided at runtime, like `@autosize (sz..., 1, 32) Chain(Conv(`..., but all the
17+
layer constructors containing `_` must be explicitly written out -- the macro sees the code as written.
18+
19+
This macro relies on a lower-level function [`outputsize`](@ref Flux.outputsize), which you can also use directly:
20+
21+
```julia
22+
c = Conv((3, 3), 1 => 5, relu, stride=2)
23+
Flux.outputsize(c, (28, 28, 1, 32)) # returns (13, 13, 5, 32)
24+
```
1725

18-
Create a CNN for a given set of configuration parameters.
26+
The function `outputsize` works by passing a "dummy" array into the model, which propagates through very cheaply.
27+
It should work for all layers, including custom layers, out of the box.
1928

20-
# Arguments
21-
- `width`: the input image width
22-
- `height`: the input image height
23-
- `inchannels`: the number of channels in the input image
24-
- `nclasses`: the number of output classes
25-
- `layer_config`: a vector of the number of filters per each conv layer
29+
An example of how to automate model building is this:
30+
```jldoctest; output = false, setup = :(using Flux)
2631
"""
27-
function make_model(width, height, inchannels, nclasses;
28-
layer_config = [16, 16, 32, 32, 64, 64])
29-
# construct a vector of conv layers programmatically
30-
conv_layers = [Conv((3, 3), inchannels => layer_config[1])]
31-
for (infilters, outfilters) in zip(layer_config, layer_config[2:end])
32-
push!(conv_layers, Conv((3, 3), infilters => outfilters))
32+
make_model(width, height, [inchannels, nclasses; layer_config])
33+
34+
Create a CNN for a given set of configuration parameters. Arguments:
35+
- `width`, `height`: the input image size in pixels
36+
- `inchannels`: the number of channels in the input image, default `1`
37+
- `nclasses`: the number of output classes, default `10`
38+
- Keyword `layer_config`: a vector of the number of channels per layer, default `[16, 16, 32, 64]`
39+
"""
40+
function make_model(width, height, inchannels = 1, nclasses = 10;
41+
layer_config = [16, 16, 32, 64])
42+
# construct a vector of layers:
43+
conv_layers = []
44+
push!(conv_layers, Conv((5, 5), inchannels => layer_config[1], relu, pad=SamePad()))
45+
for (inch, outch) in zip(layer_config, layer_config[2:end])
46+
push!(conv_layers, Conv((3, 3), inch => outch, sigmoid, stride=2))
3347
end
3448
35-
# compute the output dimensions for the conv layers
36-
# use padbatch=true to set the batch dimension to 1
37-
conv_outsize = Flux.outputsize(conv_layers, (width, height, nchannels); padbatch=true)
49+
# compute the output dimensions after these conv layers:
50+
conv_outsize = Flux.outputsize(conv_layers, (width, height, inchannels); padbatch=true)
3851
39-
# the input dimension to Dense is programatically calculated from
40-
# width, height, and nchannels
41-
return Chain(conv_layers..., Dense(prod(conv_outsize) => nclasses))
52+
# use this to define appropriate Dense layer:
53+
last_layer = Dense(prod(conv_outsize) => nclasses)
54+
return Chain(conv_layers..., Flux.flatten, last_layer)
4255
end
56+
57+
m = make_model(28, 28, 3, layer_config = [9, 17, 33, 65])
58+
59+
Flux.outputsize(m, (28, 28, 3, 42)) == (10, 42) == size(m(randn(Float32, 28, 28, 3, 42)))
60+
61+
# output
62+
63+
true
4364
```
4465

66+
Alternatively, using the macro, the definition of `make_model` could end with:
67+
68+
```
69+
# compute the output dimensions & construct appropriate Dense layer:
70+
return @autosize (width, height, inchannels, 1) Chain(conv_layers..., Flux.flatten, Dense(_ => nclasses))
71+
end
72+
```
73+
74+
### Listing
75+
4576
```@docs
77+
Flux.@autosize
4678
Flux.outputsize
4779
```

‎src/Flux.jl

+1
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ include("layers/show.jl")
5555
include("loading.jl")
5656

5757
include("outputsize.jl")
58+
export @autosize
5859

5960
include("data/Data.jl")
6061
using .Data

‎src/outputsize.jl

+165-2
Original file line numberDiff line numberDiff line change
@@ -147,8 +147,12 @@ outputsize(m::AbstractVector, input::Tuple...; padbatch=false) = outputsize(Chai
147147

148148
## bypass statistics in normalization layers
149149

150-
for layer in (:LayerNorm, :BatchNorm, :InstanceNorm, :GroupNorm)
151-
@eval (l::$layer)(x::AbstractArray{Nil}) = x
150+
for layer in (:BatchNorm, :InstanceNorm, :GroupNorm) # LayerNorm works fine
151+
@eval function (l::$layer)(x::AbstractArray{Nil})
152+
l.chs == size(x, ndims(x)-1) || throw(DimensionMismatch(
153+
string($layer, " expected ", l.chs, " channels, but got size(x) == ", size(x))))
154+
x
155+
end
152156
end
153157

154158
## fixes for layers that don't work out of the box
@@ -168,3 +172,162 @@ for (fn, Dims) in ((:conv, DenseConvDims),)
168172
end
169173
end
170174
end
175+
176+
177+
"""
178+
@autosize (size...,) Chain(Layer(_ => 2), Layer(_), ...)
179+
180+
Returns the specified model, with each `_` replaced by an inferred number,
181+
for input of the given `size`.
182+
183+
The unknown sizes are usually the second-last dimension of that layer's input,
184+
which Flux regards as the channel dimension.
185+
(A few layers, `Dense` & [`LayerNorm`](@ref), instead always use the first dimension.)
186+
The underscore may appear as an argument of a layer, or inside a `=>`.
187+
It may be used in further calculations, such as `Dense(_ => _÷4)`.
188+
189+
# Examples
190+
```
191+
julia> @autosize (3, 1) Chain(Dense(_ => 2, sigmoid), BatchNorm(_, affine=false))
192+
Chain(
193+
Dense(3 => 2, σ), # 8 parameters
194+
BatchNorm(2, affine=false),
195+
)
196+
197+
julia> img = [28, 28];
198+
199+
julia> @autosize (img..., 1, 32) Chain( # size is only needed at runtime
200+
Chain(c = Conv((3,3), _ => 5; stride=2, pad=SamePad()),
201+
p = MeanPool((3,3)),
202+
b = BatchNorm(_),
203+
f = Flux.flatten),
204+
Dense(_ => _÷4, relu, init=Flux.rand32), # can calculate output size _÷4
205+
SkipConnection(Dense(_ => _, relu), +),
206+
Dense(_ => 10),
207+
) |> gpu # moves to GPU after initialisation
208+
Chain(
209+
Chain(
210+
c = Conv((3, 3), 1 => 5, pad=1, stride=2), # 50 parameters
211+
p = MeanPool((3, 3)),
212+
b = BatchNorm(5), # 10 parameters, plus 10
213+
f = Flux.flatten,
214+
),
215+
Dense(80 => 20, relu), # 1_620 parameters
216+
SkipConnection(
217+
Dense(20 => 20, relu), # 420 parameters
218+
+,
219+
),
220+
Dense(20 => 10), # 210 parameters
221+
) # Total: 10 trainable arrays, 2_310 parameters,
222+
# plus 2 non-trainable, 10 parameters, summarysize 10.469 KiB.
223+
224+
julia> outputsize(ans, (28, 28, 1, 32))
225+
(10, 32)
226+
```
227+
228+
Limitations:
229+
* While `@autosize (5, 32) Flux.Bilinear(_ => 7)` is OK, something like `Bilinear((_, _) => 7)` will fail.
230+
* While `Scale(_)` and `LayerNorm(_)` are fine (and use the first dimension), `Scale(_,_)` and `LayerNorm(_,_)`
231+
will fail if `size(x,1) != size(x,2)`.
232+
* RNNs won't work: `@autosize (7, 11) LSTM(_ => 5)` fails, because `outputsize(RNN(3=>7), (3,))` also fails, a known issue.
233+
"""
234+
macro autosize(size, model)
235+
Meta.isexpr(size, :tuple) || error("@autosize's first argument must be a tuple, the size of the input")
236+
Meta.isexpr(model, :call) || error("@autosize's second argument must be something like Chain(layers...)")
237+
ex = _makelazy(model)
238+
@gensym m
239+
quote
240+
$m = $ex
241+
$outputsize($m, $size)
242+
$striplazy($m)
243+
end |> esc
244+
end
245+
246+
function _makelazy(ex::Expr)
247+
n = _underscoredepth(ex)
248+
n == 0 && return ex
249+
n == 1 && error("@autosize doesn't expect an underscore here: $ex")
250+
n == 2 && return :($LazyLayer($(string(ex)), $(_makefun(ex)), nothing))
251+
n > 2 && return Expr(ex.head, ex.args[1], map(_makelazy, ex.args[2:end])...)
252+
end
253+
_makelazy(x) = x
254+
255+
function _underscoredepth(ex::Expr)
256+
# Meta.isexpr(ex, :tuple) && :_ in ex.args && return 10
257+
ex.head in (:call, :kw, :(->), :block) || return 0
258+
ex.args[1] === :(=>) && ex.args[2] === :_ && return 1
259+
m = maximum(_underscoredepth, ex.args)
260+
m == 0 ? 0 : m+1
261+
end
262+
_underscoredepth(ex) = Int(ex === :_)
263+
264+
function _makefun(ex)
265+
T = Meta.isexpr(ex, :call) ? ex.args[1] : Type
266+
@gensym x s
267+
Expr(:(->), x, Expr(:block, :($s = $autosizefor($T, $x)), _replaceunderscore(ex, s)))
268+
end
269+
270+
"""
271+
autosizefor(::Type, x)
272+
273+
If an `_` in your layer's constructor, used within `@autosize`, should
274+
*not* mean the 2nd-last dimension, then you can overload this.
275+
276+
For instance `autosizefor(::Type{<:Dense}, x::AbstractArray) = size(x, 1)`
277+
is needed to make `@autosize (2,3,4) Dense(_ => 5)` return
278+
`Dense(2 => 5)` rather than `Dense(3 => 5)`.
279+
"""
280+
autosizefor(::Type, x::AbstractArray) = size(x, max(1, ndims(x)-1))
281+
autosizefor(::Type{<:Dense}, x::AbstractArray) = size(x, 1)
282+
autosizefor(::Type{<:LayerNorm}, x::AbstractArray) = size(x, 1)
283+
284+
_replaceunderscore(e, s) = e === :_ ? s : e
285+
_replaceunderscore(ex::Expr, s) = Expr(ex.head, map(a -> _replaceunderscore(a, s), ex.args)...)
286+
287+
mutable struct LazyLayer
288+
str::String
289+
make::Function
290+
layer
291+
end
292+
293+
@functor LazyLayer
294+
295+
function (l::LazyLayer)(x::AbstractArray, ys::AbstractArray...)
296+
l.layer === nothing || return l.layer(x, ys...)
297+
made = l.make(x) # for something like `Bilinear((_,__) => 7)`, perhaps need `make(xy...)`, later.
298+
y = made(x, ys...)
299+
l.layer = made # mutate after we know that call worked
300+
return y
301+
end
302+
303+
function striplazy(m)
304+
fs, re = functor(m)
305+
re(map(striplazy, fs))
306+
end
307+
function striplazy(l::LazyLayer)
308+
l.layer === nothing || return l.layer
309+
error("LazyLayer should be initialised, e.g. by outputsize(model, size), before using stiplazy")
310+
end
311+
312+
# Could make LazyLayer usable outside of @autosize, for instance allow Chain(@lazy Dense(_ => 2))?
313+
# But then it will survive to produce weird structural gradients etc.
314+
315+
function ChainRulesCore.rrule(l::LazyLayer, x)
316+
l(x), _ -> error("LazyLayer should never be used within a gradient. Call striplazy(model) first to remove all.")
317+
end
318+
function ChainRulesCore.rrule(::typeof(striplazy), m)
319+
striplazy(m), _ -> error("striplazy should never be used within a gradient")
320+
end
321+
322+
params!(p::Params, x::LazyLayer, seen = IdSet()) = error("LazyLayer should never be used within params(m). Call striplazy(m) first.")
323+
function Base.show(io::IO, l::LazyLayer)
324+
printstyled(io, "LazyLayer(", color=:light_black)
325+
if l.layer == nothing
326+
printstyled(io, l.str, color=:magenta)
327+
else
328+
printstyled(io, l.layer, color=:cyan)
329+
end
330+
printstyled(io, ")", color=:light_black)
331+
end
332+
333+
_big_show(io::IO, l::LazyLayer, indent::Int=0, name=nothing) = _layer_show(io, l, indent, name)

0 commit comments

Comments
 (0)