-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Layernorm #203
Layernorm #203
Conversation
@milancurcic @jvdp1 Ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @OneAdder . Nice work!
Great, thank you! I left several comments. |
flatten()& | ||
]) | ||
|
||
! Kaiming weights to achieve semblance of convergance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want LayerNorm used as activation function to converge, we need to at least use correct weights initialization. Otherwise, the net will not be able to consistently converge and we'll get #145 once more
| Linear (2-d) | `linear2d` | `input2d`, `layernorm`, `linear2d`, `self_attention` | 2 | ✅ | ✅ | | ||
| Self-attention | `self_attention` | `input2d`, `layernorm`, `linear2d`, `self_attention` | 2 | ✅ | ✅ | | ||
| Layer Normalization | `layernorm` | `linear2d`, `self_attention` | 2 | ✅ | ✅ | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OneAdder can you please check that I did this correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!
* layernorm: initial implementation * layernorm: rename source file * layernorm: remove redundant arguments * layernorm: remove stack allocated arrays * layernorm: rearrange into submodule * layernorm: add error to stop in test * layernorm: add gradient updates * layernorm: public api * layernorm: update tests * layernorm: update cmake * layernorm: use mold for temp allocation * layernorm: rename to layernorm * layernorm: allow usage of layernorm at the end * layernorm: integration test for layernorm * layernorm: memory allocation optimization * Tidy up * Bump version * Add layernorm to the table of layers --------- Co-authored-by: milancurcic <caomaco@gmail.com>
Layer Normalization
Layer Normalization applies normalization over a sequence of inputs by the following formula:
Where:
Reference
Lei Ba, J., Kiros, J. R., & Hinton, G. E. (2016). Layer normalization. ArXiv e-prints, arXiv-1607.
Status
Ready