[feature request] L1, L2 regularization of weights #160

SimonEnsemble · 2018-01-30T19:51:05Z

The ability to L1- or L2- regularize the coefficients of the weights of the neural network would be very helpful for my research.

L2-regularization is easy, but L1- regularization I think requires the soft-thresholding operator. I initiated a discussion here that adding the absolute value of the magnitude of the coefficients to the loss is not appropriate, but I'm not 100% sure.

[Hopefully it's okay for users to request features; I am too eager to start using Flux.jl so I can do all of my research in Julia!]

MikeInnes · 2018-01-30T20:02:47Z

Both of these are pretty easy to do:

using Flux

m = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)

l1(x) = sum(x.^2)
l2(x) = sum(abs.(x))

sum(l1, params(m)) # Add this to your loss

We should probably add a note to the docs so it's more obvious that we have it, though :)

SimonEnsemble · 2018-01-30T20:05:56Z

Oh, awesome! [l1, l2 are switched though, I think]. That would be great to add in the docs as an example. Also, if possible, to regularize the outputs of the neuron.

But as my Stack Overflow question suggests, I think the L1 regularization needs the soft threshold operator to bring the coefficients to exactly zero when they are close enough.

MikeInnes · 2018-01-30T20:11:28Z

Just skimming over this right now, but it looks like soft-threshold is a scalar function that you'd broadcast? In which case it should be trivial to define it, and broadcasting it will just work with our AD. Happy to help if you can't get that working.

iblislin · 2018-01-31T15:40:18Z

By the way, is it possible to integrate Distances.jl then gain tons of function like l2, l1, kl divergance?

MikeInnes · 2018-02-09T18:23:44Z

Possibly, although it looks like an unnecessarily heavy API around call and broadcast from where I'm standing. I'd be happy to just add the definitions to Flux though.

MikeInnes · 2018-02-09T19:01:39Z

I've addressed this for now by documenting how to do it in the manual. I'm happy to help with the soft thresholding thing as well, if I can get some more detail on it.

rakeshvar · 2020-12-26T08:19:48Z

@SimonEnsemble
Yes, NNs do a bad job with L1 regularization by just adding it to the loss.
You could implement soft-thresholding via the call back. After each update step, you can soft-threshold the weights yourself.
Do you know any other way?

MikeInnes closed this as completed in 0b3c02f Feb 9, 2018

lihebi mentioned this issue Nov 20, 2019

filtering a subset of parameters #939

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] L1, L2 regularization of weights #160

[feature request] L1, L2 regularization of weights #160

SimonEnsemble commented Jan 30, 2018

MikeInnes commented Jan 30, 2018

SimonEnsemble commented Jan 30, 2018

MikeInnes commented Jan 30, 2018

iblislin commented Jan 31, 2018

MikeInnes commented Feb 9, 2018

MikeInnes commented Feb 9, 2018

rakeshvar commented Dec 26, 2020

[feature request] L1, L2 regularization of weights #160

[feature request] L1, L2 regularization of weights #160

Comments

SimonEnsemble commented Jan 30, 2018

MikeInnes commented Jan 30, 2018

SimonEnsemble commented Jan 30, 2018

MikeInnes commented Jan 30, 2018

iblislin commented Jan 31, 2018

MikeInnes commented Feb 9, 2018

MikeInnes commented Feb 9, 2018

rakeshvar commented Dec 26, 2020