Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Layernorm #203

Merged
merged 18 commits into from
Feb 25, 2025
Merged

Layernorm #203

merged 18 commits into from
Feb 25, 2025

Conversation

OneAdder
Copy link
Collaborator

@OneAdder OneAdder commented Feb 17, 2025

Layer Normalization

Layer Normalization applies normalization over a sequence of inputs by the following formula:

$\frac{x - \bar{x}}{\sqrt{var(x) + \epsilon}} \cdot \gamma + \beta$

Where:

  • $\bar{x}$ is arithmetic mean over the last dimension
  • $var(x)$ is variance over the last dimension
  • $\epsilon$ is a scaling constant (small)
  • $\gamma$ and $\beta$ are trainable parameters

Reference

Lei Ba, J., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. ArXiv e-prints, arXiv-1607.

Status

Ready

@OneAdder OneAdder marked this pull request as ready for review February 23, 2025 13:35
@OneAdder
Copy link
Collaborator Author

@milancurcic @jvdp1 Ready for review

Copy link
Collaborator

@jvdp1 jvdp1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @OneAdder . Nice work!

@OneAdder OneAdder mentioned this pull request Feb 24, 2025
@milancurcic
Copy link
Member

Great, thank you! I left several comments.

flatten()&
])

! Kaiming weights to achieve semblance of convergance
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want LayerNorm used as activation function to converge, we need to at least use correct weights initialization. Otherwise, the net will not be able to consistently converge and we'll get #145 once more

Comment on lines +37 to +39
| Linear (2-d) | `linear2d` | `input2d`, `layernorm`, `linear2d`, `self_attention` | 2 | ✅ | ✅ |
| Self-attention | `self_attention` | `input2d`, `layernorm`, `linear2d`, `self_attention` | 2 | ✅ | ✅ |
| Layer Normalization | `layernorm` | `linear2d`, `self_attention` | 2 | ✅ | ✅ |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OneAdder can you please check that I did this correctly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks good!

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!

@milancurcic milancurcic merged commit e68e6c2 into modern-fortran:main Feb 25, 2025
4 checks passed
OneAdder added a commit to OneAdder/neural-fortran that referenced this pull request Mar 3, 2025
* layernorm: initial implementation

* layernorm: rename source file

* layernorm: remove redundant arguments

* layernorm: remove stack allocated arrays

* layernorm: rearrange into submodule

* layernorm: add error to stop in test

* layernorm: add gradient updates

* layernorm: public api

* layernorm: update tests

* layernorm: update cmake

* layernorm: use mold for temp allocation

* layernorm: rename to layernorm

* layernorm: allow usage of layernorm at the end

* layernorm: integration test for layernorm

* layernorm: memory allocation optimization

* Tidy up

* Bump version

* Add layernorm to the table of layers

---------

Co-authored-by: milancurcic <caomaco@gmail.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants