Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

add Kaiming He initialization, fixed Xavier initialization #311

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/distributions.jl
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ function xavier(a...)
fanout = size(w, ndims(w))
fanin = div(length(w), fanout)
end
s = convert(eltype(w), sqrt(2 / (fanin + fanout)))
s = convert(eltype(w), sqrt(6 / (fanin + fanout)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, our version is specialized for conv layers with relu activation. The part you changed is called as gain. You may want to update your pr to allow the xavier function to accept the gain parameter. And its default value can be 6.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I barely know the theoretical background. I guess you are referring to "Delving Deep into Rectifiers" paper when you say it is specialized for conv layers with relu activation. In the paper, it states this should hold: n_l * var(w_l) = 2 where n_l is the average number of units per layer. You can check that:
x = xavier(200,300)
(200+300) / 2 * var(x) ~= 0.33, where this value should be 1.0 for Xavier, 2.0 for ReLU activation. I also compared xavier with Tensorflow's equivalent initializer. TF's xavier is ~3 times of xavier, and TF's kaiming (relu specialized xavier) is ~6 times of xavier, consistently.

As for your suggestion I am very new to Julia and I couldn't find a way to edit arguments so that it is compatible with pre-existing models. However there can be another distribution that takes both gain and n as arguments (as in TF).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use keyword arguments for options.

xavier(a...; gain = 6)

w = 2s*w-s
end

Expand Down