Layernorm #203

OneAdder · 2025-02-17T15:08:27Z

Layer Normalization

Layer Normalization applies normalization over a sequence of inputs by the following formula:

$\frac{x - \bar{x}}{\sqrt{var(x) + \epsilon}} \cdot \gamma + \beta$

Where:

$\bar{x}$ is arithmetic mean over the last dimension
$var(x)$ is variance over the last dimension
$\epsilon$ is a scaling constant (small)
$\gamma$ and $\beta$ are trainable parameters

Reference

Lei Ba, J., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. ArXiv e-prints, arXiv-1607.

Status

Ready

src/nf/nf_layernorm_submodule.f90

test/test_layernorm.f90

OneAdder · 2025-02-23T13:35:28Z

@milancurcic @jvdp1 Ready for review

jvdp1

LGTM @OneAdder . Nice work!

src/nf/nf_layernorm_submodule.f90

src/nf/nf_layer_constructors.f90

src/nf/nf_layernorm_submodule.f90

test/test_layernorm.f90

milancurcic · 2025-02-24T22:45:43Z

Great, thank you! I left several comments.

OneAdder · 2025-02-25T08:05:35Z

test/test_layernorm.f90

+        flatten()&
+    ])
+
+    ! Kaiming weights to achieve semblance of convergance


If we want LayerNorm used as activation function to converge, we need to at least use correct weights initialization. Otherwise, the net will not be able to consistently converge and we'll get #145 once more

milancurcic · 2025-02-25T17:00:57Z

README.md

+| Linear (2-d) | `linear2d` | `input2d`, `layernorm`, `linear2d`, `self_attention` | 2 | ✅ | ✅ |
+| Self-attention | `self_attention` | `input2d`, `layernorm`, `linear2d`, `self_attention` | 2 | ✅ | ✅ |
+| Layer Normalization | `layernorm` | `linear2d`, `self_attention` | 2 | ✅ | ✅ |


@OneAdder can you please check that I did this correctly?

Yes, looks good!

milancurcic

Thank you!!

* layernorm: initial implementation * layernorm: rename source file * layernorm: remove redundant arguments * layernorm: remove stack allocated arrays * layernorm: rearrange into submodule * layernorm: add error to stop in test * layernorm: add gradient updates * layernorm: public api * layernorm: update tests * layernorm: update cmake * layernorm: use mold for temp allocation * layernorm: rename to layernorm * layernorm: allow usage of layernorm at the end * layernorm: integration test for layernorm * layernorm: memory allocation optimization * Tidy up * Bump version * Add layernorm to the table of layers --------- Co-authored-by: milancurcic <caomaco@gmail.com>

jvdp1 reviewed Feb 21, 2025

View reviewed changes

src/nf/nf_layernorm_submodule.f90 Outdated Show resolved Hide resolved

jvdp1 reviewed Feb 21, 2025

View reviewed changes

test/test_layernorm.f90 Outdated Show resolved Hide resolved

OneAdder force-pushed the layernorm branch from f08f804 to 7f10837 Compare February 21, 2025 18:21

OneAdder added 10 commits February 23, 2025 15:52

layernorm: initial implementation

362015d

layernorm: rename source file

005daf2

layernorm: remove redundant arguments

d657fa7

layernorm: remove stack allocated arrays

0dbaf07

layernorm: rearrange into submodule

612db46

layernorm: add error to stop in test

c4a3e3c

layernorm: add gradient updates

bdefd02

layernorm: public api

ccc180e

layernorm: update tests

0667000

layernorm: update cmake

c2a1e70

OneAdder force-pushed the layernorm branch from 7f10837 to c2a1e70 Compare February 23, 2025 13:35

OneAdder marked this pull request as ready for review February 23, 2025 13:35

jvdp1 approved these changes Feb 23, 2025

View reviewed changes

src/nf/nf_layernorm_submodule.f90 Outdated Show resolved Hide resolved

layernorm: use mold for temp allocation

ddcd204

OneAdder mentioned this pull request Feb 24, 2025

Fc2d layer #208

Open