-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Use faster activation functions #1837
Conversation
It's a pity to pollute the code a bit but I guess the performance increase is worth it. I wonder what could be a more aesthetically pleasing alternative. Pointing Just for reference, the reason why this PR is switching activation in the forward instead of construction time is here
|
I wouldn't claim this as an aesthetic improvement. But it is a performance one. I'm against less explicit names like |
Ok. Should we wait for a final tag of the v0.12 before merging (#1838)? I wouldn't say this PR is breaking though |
Not so breaking as to demand a release, but if there's one nearby perhaps that's the safe time. So after last 0.12.x tag, before v0.13, as you say. |
Buildkite is failing with weird errors again 😬 |
bors try |
tryBuild succeeded: |
Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## master #1837 +/- ##
==========================================
+ Coverage 73.85% 73.94% +0.09%
==========================================
Files 28 28
Lines 1683 1689 +6
==========================================
+ Hits 1243 1249 +6
Misses 440 440
Continue to review full report at Codecov.
|
Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
c′ = @. sigmoid_fast(forget) * c + sigmoid_fast(input) * tanh_fast(cell) | ||
h′ = @. sigmoid_fast(output) * tanh_fast(c′) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ought these to get the fast_act
treatment too? I'm ok with revisiting too, RNN cell inflexibility is a bit of a long standing issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right. If we decide to disable fast_tanh
on CuArrays, that will be ignored here. But perhaps revisit if & when... it's a bit more clutter to squeeze that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a blessing in disguise, as currently plain tanh.(...)
will hit the likely obsolete https://github.com/FluxML/NNlibCUDA.jl/blob/master/src/cudnn/activations.jl.
This substitutes
tanh_fast
where possible, sincetanh
often dominates the forward pass of small networks. Builds on #1761, but in the months that sat waiting, I have mislaid the benchmarks I ran. Best case 2x better forwards, worst case no improvement, modulo noise.Intersects with #1832, which would also do this to conv layers, but not to RNNs.
Closes #1272; this version is significantly faster, and this PR applies it to more cases.