Skip to content

use vmap for all activations #221

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

AStupidBear
Copy link
Contributor

@AStupidBear AStupidBear commented Jul 11, 2020

fix #220

using NNlib, BenchmarkTools

ACTIVATION_FUNCTIONS = [σ, hardσ, logσ, hardtanh, relu, leakyrelu, relu6, rrelu, elu, gelu, celu, swish, lisht, selu, trelu, softplus, softsign, logcosh, mish, tanhshrink, softshrink];

for T in (Float32, Float64)
    x = rand(T, 500)
    for a in ACTIVATION_FUNCTIONS
        @show a
        @btime $a.($x)
        @btime map($a, $x)
    end
end
a = NNlib.σ
  753.493 ns (1 allocation: 2.13 KiB)
  4.052 μs (2 allocations: 2.14 KiB)
a = NNlib.hardσ
  644.024 ns (1 allocation: 2.13 KiB)
  1.653 μs (2 allocations: 2.14 KiB)
a = NNlib.logσ
  2.659 μs (1 allocation: 2.13 KiB)
  27.581 μs (2 allocations: 2.14 KiB)
a = NNlib.hardtanh
  542.984 ns (1 allocation: 2.13 KiB)
  2.261 μs (2 allocations: 2.14 KiB)
a = NNlib.relu
  576.228 ns (1 allocation: 2.13 KiB)
  716.754 ns (2 allocations: 2.14 KiB)
a = NNlib.leakyrelu
  531.237 ns (1 allocation: 2.13 KiB)
  1.965 μs (2 allocations: 2.14 KiB)
a = NNlib.relu6
  524.547 ns (1 allocation: 2.13 KiB)
  1.252 μs (2 allocations: 2.14 KiB)
a = NNlib.rrelu
  919.600 ns (1 allocation: 2.13 KiB)
  5.401 μs (2 allocations: 2.14 KiB)
a = NNlib.elu
  794.806 ns (1 allocation: 2.13 KiB)
  3.864 μs (2 allocations: 2.14 KiB)
a = NNlib.gelu
  1.525 μs (1 allocation: 2.13 KiB)
  11.793 μs (2 allocations: 2.14 KiB)
a = NNlib.celu
  824.069 ns (1 allocation: 2.13 KiB)
  3.813 μs (2 allocations: 2.14 KiB)
a = NNlib.swish
  799.932 ns (1 allocation: 2.13 KiB)
  4.438 μs (2 allocations: 2.14 KiB)
a = NNlib.lisht
  1.367 μs (1 allocation: 2.13 KiB)
  9.907 μs (2 allocations: 2.14 KiB)
a = NNlib.selu
  779.846 ns (1 allocation: 2.13 KiB)
  4.023 μs (2 allocations: 2.14 KiB)
a = NNlib.trelu
  524.082 ns (1 allocation: 2.13 KiB)
  654.231 ns (2 allocations: 2.14 KiB)
a = NNlib.softplus
  2.565 μs (1 allocation: 2.13 KiB)
  28.825 μs (2 allocations: 2.14 KiB)
a = NNlib.softsign
  565.306 ns (1 allocation: 2.13 KiB)
  807.525 ns (2 allocations: 2.14 KiB)
a = NNlib.logcosh
  2.840 μs (1 allocation: 2.13 KiB)
  31.271 μs (2 allocations: 2.14 KiB)
a = NNlib.mish
  4.533 μs (1 allocation: 2.13 KiB)
  46.708 μs (2 allocations: 2.14 KiB)
a = NNlib.tanhshrink
  1.415 μs (1 allocation: 2.13 KiB)
  9.858 μs (2 allocations: 2.14 KiB)
a = NNlib.softshrink
  528.125 ns (1 allocation: 2.13 KiB)
  2.321 μs (2 allocations: 2.14 KiB)

a = NNlib.σ
  1.169 μs (1 allocation: 4.06 KiB)
  5.805 μs (2 allocations: 4.08 KiB)
a = NNlib.hardσ
  838.711 ns (1 allocation: 4.06 KiB)
  1.612 μs (2 allocations: 4.08 KiB)
a = NNlib.logσ
  5.995 μs (1 allocation: 4.06 KiB)
  34.751 μs (2 allocations: 4.08 KiB)
a = NNlib.hardtanh
  677.167 ns (1 allocation: 4.06 KiB)
  2.299 μs (2 allocations: 4.08 KiB)
a = NNlib.relu
  745.859 ns (1 allocation: 4.06 KiB)
  722.837 ns (2 allocations: 4.08 KiB)
a = NNlib.leakyrelu
  742.216 ns (1 allocation: 4.06 KiB)
  1.854 μs (2 allocations: 4.08 KiB)
a = NNlib.relu6
  723.690 ns (1 allocation: 4.06 KiB)
  1.297 μs (2 allocations: 4.08 KiB)
a = NNlib.rrelu
  1.327 μs (1 allocation: 4.06 KiB)
  5.316 μs (2 allocations: 4.08 KiB)
a = NNlib.elu
  1.120 μs (1 allocation: 4.06 KiB)
  5.244 μs (2 allocations: 4.08 KiB)
a = NNlib.gelu
  3.206 μs (1 allocation: 4.06 KiB)
  15.002 μs (2 allocations: 4.08 KiB)
a = NNlib.celu
  1.176 μs (1 allocation: 4.06 KiB)
  5.046 μs (2 allocations: 4.08 KiB)
a = NNlib.swish
  1.181 μs (1 allocation: 4.06 KiB)
  5.662 μs (2 allocations: 4.08 KiB)
a = NNlib.lisht
  2.822 μs (1 allocation: 4.06 KiB)
  13.317 μs (2 allocations: 4.08 KiB)
a = NNlib.selu
  1.079 μs (1 allocation: 4.06 KiB)
  5.663 μs (2 allocations: 4.08 KiB)
a = NNlib.trelu
  774.219 ns (1 allocation: 4.06 KiB)
  785.364 ns (2 allocations: 4.08 KiB)
a = NNlib.softplus
  5.846 μs (1 allocation: 4.06 KiB)
  37.976 μs (2 allocations: 4.08 KiB)
a = NNlib.softsign
  718.581 ns (1 allocation: 4.06 KiB)
  993.700 ns (2 allocations: 4.08 KiB)
a = NNlib.logcosh
  6.133 μs (1 allocation: 4.06 KiB)
  45.347 μs (2 allocations: 4.08 KiB)
a = NNlib.mish
  11.400 μs (1 allocation: 4.06 KiB)
  57.121 μs (2 allocations: 4.08 KiB)
a = NNlib.tanhshrink
  2.799 μs (1 allocation: 4.06 KiB)
  13.361 μs (2 allocations: 4.08 KiB)
a = NNlib.softshrink
  729.505 ns (1 allocation: 4.06 KiB)
  2.459 μs (2 allocations: 4.08 KiB)

@CarloLucibello
Copy link
Member

could you update the OP with the benchmark result for your system?

@CarloLucibello
Copy link
Member

we are missing tanh here, unless we decide to exclude in order to avoid type piracy

@@ -118,7 +119,7 @@ elu(x::RealOrFloatType, α = one(x)) = vifelse(x ≥ 0, x / one(x), α * (exp(x)
activation function.
"""
function gelu(x::RealOrFloatType)
p = oftype(x / 1, π)
p = oftype(x / 1, Float64(π))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this hardcoding the type here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zygote willl fail for oftype(..., ::Irrational)

@DhairyaLGandhi
Copy link
Member

Needs testing against Zygote and CUDA to make sure we don't break any dispatch that we are relying on.

@AStupidBear
Copy link
Contributor Author

AStupidBear commented Jul 14, 2020

Zygote is already tested against here. But testing against CUDA may be out of the scope of NNlib?

@johnnychen94
Copy link

johnnychen94 commented Jul 18, 2020

Do we need to update the lower bound for the LoopVectorization version in Project.toml? I don't know what happens under the hook, just ask for sure.

@CarloLucibello
Copy link
Member

given #224 and the discussion in FluxML/Flux.jl#1272, I no longer think this is the correct way forward. The whole vectorization logic should live in Flux's layers definitions, and we should revert NNlib to its pre-LoopVectorization state

@DhairyaLGandhi
Copy link
Member

We should revert the vectorisation stuff and release a patch that drops the packages from dependencies.

@DhairyaLGandhi
Copy link
Member

Can you also add the same benchmarks using a simple Dense(28*28, 50) to see the relative speedup

@avx f.(w * x .+ b) basically

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

use vmap for all activations
4 participants