Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[feature request] L1, L2 regularization of weights #160

Closed
SimonEnsemble opened this issue Jan 30, 2018 · 7 comments
Closed

[feature request] L1, L2 regularization of weights #160

SimonEnsemble opened this issue Jan 30, 2018 · 7 comments

Comments

@SimonEnsemble
Copy link

The ability to L1- or L2- regularize the coefficients of the weights of the neural network would be very helpful for my research.

L2-regularization is easy, but L1- regularization I think requires the soft-thresholding operator. I initiated a discussion here that adding the absolute value of the magnitude of the coefficients to the loss is not appropriate, but I'm not 100% sure.

[Hopefully it's okay for users to request features; I am too eager to start using Flux.jl so I can do all of my research in Julia!]

@MikeInnes
Copy link
Member

Both of these are pretty easy to do:

using Flux

m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)

l1(x) = sum(x.^2)
l2(x) = sum(abs.(x))

sum(l1, params(m)) # Add this to your loss

We should probably add a note to the docs so it's more obvious that we have it, though :)

@SimonEnsemble
Copy link
Author

Oh, awesome! [l1, l2 are switched though, I think]. That would be great to add in the docs as an example. Also, if possible, to regularize the outputs of the neuron.

But as my Stack Overflow question suggests, I think the L1 regularization needs the soft threshold operator to bring the coefficients to exactly zero when they are close enough.

@MikeInnes
Copy link
Member

Just skimming over this right now, but it looks like soft-threshold is a scalar function that you'd broadcast? In which case it should be trivial to define it, and broadcasting it will just work with our AD. Happy to help if you can't get that working.

@iblislin
Copy link
Contributor

By the way, is it possible to integrate Distances.jl then gain tons of function like l2, l1, kl divergance?

@MikeInnes
Copy link
Member

Possibly, although it looks like an unnecessarily heavy API around call and broadcast from where I'm standing. I'd be happy to just add the definitions to Flux though.

@MikeInnes
Copy link
Member

I've addressed this for now by documenting how to do it in the manual. I'm happy to help with the soft thresholding thing as well, if I can get some more detail on it.

@rakeshvar
Copy link

@SimonEnsemble
Yes, NNs do a bad job with L1 regularization by just adding it to the loss.
You could implement soft-thresholding via the call back. After each update step, you can soft-threshold the weights yourself.
Do you know any other way?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants