Adds full gelu without approximation #629

se-schmitt · 2025-02-06T10:52:41Z

Adds the full gelu without approximation as gelu(x) and moves the tanh approximation used before to gelu_fast. See #628 for details.

PR Checklist

Tests are added
Documentation, if applicable

ToucheSir · 2025-02-06T20:44:52Z

https://github.com/FluxML/NNlib.jl/actions/runs/13177183802/job/36779142351?pr=629#step:7:842 is a real test failure. I think Flux's Nil + outputsize machinery needs to be adjusted to understand SpecialFunctions.erf. The question is how, so I've opened FluxML/Flux.jl#2588 to track this.

mcabbott · 2025-02-07T20:53:18Z

Quick look at how different these functions are:

julia> using SpecialFunctions, NNlib

julia> oftf(x, y) = oftype(float(x), y);

julia> new_gelu(x) = x/2*(1 + erf(x/sqrt(oftf(x,2))));

julia> rel(x) = (new_gelu(x) - gelu(x)) / new_gelu(x);

julia> rel.(-3:0.2f0:1)
21-element Vector{Float32}:
   0.101809666
   0.06506126
   0.038386323
   0.02019246
   0.008700018
   0.0021531228
  -0.0010194147
  -0.0021005166
  -0.0020575877
  -0.001547056
  -0.0009626968
  -0.00049560843
  -0.0002001288
  -5.4488235f-5
  -6.02081f-6
 NaN
   4.4374747f-6
   2.876006f-5
   7.55584f-5
   0.00013319723
   0.00018150361

julia> rel_eps(x) = (new_gelu(x) - gelu(x)) / eps(new_gelu(x));

julia> Int.(rel_eps.(-3:0.2f0:1))
21-element Vector{Int64}:
 -885402
 -999594
 -499514
 -213282
 -142868
  -26298
    8849
   24719
   31223
   14336
   10250
    5637
    2210
     504
      68
       0
      69
     253
    1104
    1409
    2562

se-schmitt · 2025-02-09T16:41:19Z

I modified it such that gelu remains the same and added the full gelu as gelu_full as discussed in #628 . This avoids breaking changes and the test failure from above, however, gelu_full is still not compatible with Flux' outputsize function.

se-schmitt · 2025-02-12T07:37:26Z

@ToucheSir Can this be merged? If not, what is required to make it mergeable?

mcabbott · 2025-02-12T15:37:58Z

It's a little sad that NNlib must depend on SpecialFunctions... maybe not so expensive?

julia> @time_imports using SpecialFunctions
      8.7 ms  IrrationalConstants
               ┌ 0.0 ms DocStringExtensions.__init__() 
     46.6 ms  DocStringExtensions 97.36% compilation time
      0.6 ms  LogExpFunctions
               ┌ 2.5 ms OpenLibm_jll.__init__() 
      4.2 ms  OpenLibm_jll
      0.4 ms  JLLWrappers
               ┌ 9.2 ms CompilerSupportLibraries_jll.__init__() 
     11.1 ms  CompilerSupportLibraries_jll
               ┌ 6.0 ms OpenSpecFun_jll.__init__() 93.49% compilation time
      6.5 ms  OpenSpecFun_jll 86.17% compilation time
      3.2 ms  SpecialFunctions

Re name bike-shedding, some chance we should use neutral names like gelu_erf and gelu_tanh, with both names available immediately but const gelu = gelu_tanh for now to be non-breaking. (I do not think either should be called gelu_fast, as the point of tanh_fast is that we sometimes automatically replace tanh with that, but there is no plan to automatically replace one of these with the other.)

se-schmitt · 2025-02-13T10:11:23Z

@mcabbott I modified the code as suggested:

Instead of the SpecialFuncitons.jl package, only OpenLibm_jll.jl is used now and the erf function is defined via ccall (as in SpecialFunctions.jl).
I also renamed the functions to gelu_tanh and gelu_erf with const gelu = gelu_tanh. I made this also transparent in the documentation.

mcabbott

This looks basically fine to me, thanks.

Maybe avoiding SpecialFunctions is a rabbit-hole, sorry.

One question is: How well do these variants work on the GPU? Presumably ccall((:erff, libopenlibm), Float32, (Float32,), x) won't work... does SpecialFunctions have code to make erf.(cu(rand(10))) work by another path?

src/activations.jl

ToucheSir · 2025-02-14T21:49:08Z

does SpecialFunctions have code to make erf.(cu(rand(10))) work by another path?

Yes, CUDA.jl defines its own overloads at https://github.com/JuliaGPU/CUDA.jl/blob/master/ext/SpecialFunctionsExt.jl.

If we want to talk load times, Flux has a direct dep on SpecialFunctions already. If import latency is a pressing concern, we could define a stub function gelu_erf end in NNlib and the method for that function in a SpecialFunctionsExt.

se-schmitt · 2025-02-17T10:15:46Z

The implementation via OpenLibm_jll was naive, sorry... I added the missing rules for AD locally which made it compatible with ForwardDiff, Zygote and Enzyme. However, compatibility with other AD and the GPU packages would need further modifications.

I would prefer an option that includes SpecialFunctions.jl as this seems much cleaner to me (either as direct dependency or extension (is this a problem for Lux.jl where SpecialFun is not a direct dependency?)).

What would you prefer? @mcabbott @ToucheSir

ToucheSir · 2025-02-21T17:11:45Z

Can you try the stub function + extension approach I suggested above? If that turns out to be a dead end, I'm fine with going back to the original plan and having SpecialFunctions as a direct dep. There's already an outsized chance any user of NNlib will have it in their environment, so we wouldn't lose much by including it.

se-schmitt · 2025-02-25T21:52:38Z

@ToucheSir The extension approach worked out well. I tested gelu_erf with Flux and Lux and it worked seamlessly.

ToucheSir

Looking good, just a couple of touch-ups and let's get this merged.

ext/NNlibSpecialFunctionsExt.jl

src/activations.jl

ToucheSir

Thanks, this was great work :)

se-schmitt added 3 commits February 6, 2025 10:10

Add SpecialFunctions as dependency

cc4b9eb

add full gelu, changes old gelu -> gelu_fast

86256b2

Add tests and docs

8275b18

ToucheSir linked an issue Feb 6, 2025 that may be closed by this pull request

"Full" gelu without approximation #628

Closed

ToucheSir mentioned this pull request Feb 6, 2025

Nil doesn't understand SpecialFunctions functions FluxML/Flux.jl#2588

Open

Change names: gelu_fast -> gelu, gelu -> gelu_full

bdfa0f0

Rename gelus and use OpenLibm_jll instead of SpecialFunctions.jl for erf

44ee4f5

mcabbott reviewed Feb 14, 2025

View reviewed changes

src/activations.jl Outdated Show resolved Hide resolved

Create NNlibSpecialFunctionsExt for gelu_erf

ff5513d

ToucheSir reviewed Feb 26, 2025

View reviewed changes

ext/NNlibSpecialFunctionsExt.jl Outdated Show resolved Hide resolved

src/activations.jl Outdated Show resolved Hide resolved

se-schmitt added 2 commits February 26, 2025 08:10

specific import in SpecialFunctions extension

48e6920

add gelu export to list of aliases

7ac8bea

ToucheSir approved these changes Feb 28, 2025

View reviewed changes

ToucheSir merged commit 7484f95 into FluxML:master Feb 28, 2025
11 of 13 checks passed

se-schmitt deleted the Full-gelu branch February 28, 2025 20:01

bclyons12 mentioned this pull request Mar 1, 2025

Incompatibility with latest NNlib ProjectTorreyPines/EPEDNN.jl#7

Closed

2 tasks

se-schmitt mentioned this pull request Mar 1, 2025

Use gelu_erf as activation function for HGF hidden_act gelu and gelu_python chengchingwen/Transformers.jl#209

Merged

se-schmitt mentioned this pull request Mar 9, 2025

Hanna ChemBERT fix ClapeyronThermo/Clapeyron.jl#347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds full gelu without approximation #629

Adds full gelu without approximation #629

se-schmitt commented Feb 6, 2025

ToucheSir commented Feb 6, 2025 •

edited

Loading

mcabbott commented Feb 7, 2025

se-schmitt commented Feb 9, 2025

se-schmitt commented Feb 12, 2025

mcabbott commented Feb 12, 2025

se-schmitt commented Feb 13, 2025

mcabbott left a comment

ToucheSir commented Feb 14, 2025 •

edited

Loading

se-schmitt commented Feb 17, 2025

ToucheSir commented Feb 21, 2025

se-schmitt commented Feb 25, 2025

ToucheSir left a comment

ToucheSir left a comment

Adds full gelu without approximation #629

Adds full gelu without approximation #629

Conversation

se-schmitt commented Feb 6, 2025

PR Checklist

ToucheSir commented Feb 6, 2025 • edited Loading

mcabbott commented Feb 7, 2025

se-schmitt commented Feb 9, 2025

se-schmitt commented Feb 12, 2025

mcabbott commented Feb 12, 2025

se-schmitt commented Feb 13, 2025

mcabbott left a comment

Choose a reason for hiding this comment

ToucheSir commented Feb 14, 2025 • edited Loading

se-schmitt commented Feb 17, 2025

ToucheSir commented Feb 21, 2025

se-schmitt commented Feb 25, 2025

ToucheSir left a comment

Choose a reason for hiding this comment

ToucheSir left a comment

Choose a reason for hiding this comment

ToucheSir commented Feb 6, 2025 •

edited

Loading

ToucheSir commented Feb 14, 2025 •

edited

Loading