-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fc2d layer #208
base: main
Are you sure you want to change the base?
Fc2d layer #208
Conversation
@OneAdder Please forgive my ignorance here. Could you please clarify the distinction between the |
@jvdp1 The terms are not particularly well defined here in practice. This is also sometimes called dense. The mathematical distinction is that |
Thanks @OneAdder for starting this. From your explanation I understand what this does. Rather than introducing a composition of multiple operations as a single layer, I suggest that we build a basic building block first, and then if needed, we can add a "shallow-wrapper" layer around those elementary layers. Specifically, rather than introducing here a new layer that does "first linear transformation => activation => second linear transformation", I suggest we simply introduce a Then, the operation proposed here would be: And thanks for pointing out the incorrect softmax derivative. I don't even recall how and why I did that. |
@milancurcic It makes sense. I can do it. Should we merge this and then refactor it or the other way around? |
Thanks, @OneAdder. If you agree, I suggest that here we simply provide a 2-d version of an existing Good ideas for |
Actually, since we already have |
Fully-Connected Layer for 2D Shapes
Also known as MLP, FeedForward, etc. A common component of neural networks, including transformers. The idea is very simple: first linear transformation => activation => second linear transformation.
This is the last piece of tranformer architecture.
When #203, #205 and this one are merged. We can start adding transformer encoders and decoders.
Python reference: https://github.com/OneAdder/neural-fortran-references/blob/main/fc2d_layer.py
Problem
Softmax derivative here is incorrect. This implementation is actually prime of logistic function which does not equivalent to softmax.
Derivative of softmax w.r.t. to each element in input requires computation of Jacobian matrix:
Where:
Similar to my implementation for MultiHead Attention here.
Possible Solutions
It is not easy to resolve as
activation_function
doesn't accept input, so:softmax
is passed as activationactivation_layer
that extendsbase_layer
and accepts activation function