-
I think that the implementation of nfnet does not consider the update of beta. In the paper https://arxiv.org/pdf/2101.08692.pdf , the authors update beta after each residual block as:
In the implementation, it seems to me that beta is equal to the initialization everywhere. Is this correct? |
Beta Was this translation helpful? Give feedback.
Answered by
vballoli
Apr 21, 2021
Replies: 1 comment 1 reply
-
Yeah, it'll be fixed soon in addition to the training scripts. Thanks a lot for noticing this and bringing this up. Appreciate the effort! |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
simomagi
# for free
to join this conversation on GitHub.
Already have an account?
# to comment
Yeah, it'll be fixed soon in addition to the training scripts. Thanks a lot for noticing this and bringing this up. Appreciate the effort!