Tie Estimation Error to Variance #30

davidrosenberg · 2017-01-15T20:50:20Z

Given a sample, we get an estimator in the hypothesis space. The performance gap between the estimator and the best in the space is the estimation error. The estimator is a random function, so if we repeat the procedure on a new training set, we will end up with a new estimator. Show a different point for each new batch of data, clustering around the optimal. If we take a larger training set, the variance of those points should decrease. I don't know of a precise measure of this "variance". But if I draw it this way, need to point out that this is just a cartoon, in which points in the space correspond to prediction functions, and closer points correspond to prediction functions that have more similar predictions (say in L2 norm for score functions, or probability of difference for classifications).

Probably of relevance here is Pedro Domingos's paper on generalizing bias-variance decompositions beyond the square loss: http://homes.cs.washington.edu/~pedrod/bvd.pdf

brett1479 · 2017-01-15T22:12:09Z

I thought about this back when I watched the videos. For parametric estimators, you can talk about your uncertainty in the parameter values (I made a concept-check question about the covariance matrix of \hat{w} for least squares linear regression, and ridge regression). In general, I think L2 methods are the way, but I don't have a reference.

vakobzar · 2017-01-18T16:34:43Z

Hi David,

What do you think about the following visualizations for the excess risk decomposition?

Decision tree -- expand on the classification problem from the slides:
(a) 2D plots similar to p 30, 31 of your slides Excess Risk Decomposition with a few different sample sizes: We can plot the depth of the tree on the x-axis and the error on the y-axis, decomposed into estimation, approximation and optimization errors by colored bar.
(b) Also 3D plots showing the depth on the x-axis, the sample size on the y-axis and the error on the z-axis, decomposed into 3 surfaces representing estimation, approximation and optimization errors.
Linear model -- y(x) = a+bx_1+cx_2 where x = (x_1,x_2). We sample from a distribution y(x)=w_0+w_1 x+ \epsilon where \epsilon \sim N(0, 2^2). We plot the clustering of w = (a, b, c) representing the estimation error for different sample sizes.
Ridge regression -- We can plot the error vs complexity. Do you have a particular distribution to
sample from in mind?

Thank you.

Best,
Vlad

vakobzar · 2017-01-20T06:48:19Z

Good evening David,

I posted a 2D animation for GD with fixed step at
https://github.com/davidrosenberg/mlcourse-homework/blob/master/in-prep/recitations/gd_fixed_step_2d.ipynb
Please let me know if this is what you had in mind. I will overlay the other gradient descent methods we discussed tomorrow (Friday).
For the demo of the distribution of minibatch SGD directions, are you OK if we use a ridge regression model and sample from a linear model with additive Gaussian noise? Also did you have any particular step size in mind, e.g., 1/n?

Thank you very much.

Best,
Vlad

davidrosenberg · 2017-01-22T16:13:11Z

Hi Vlad -- 2d animation looks good. for minibatch SGD, what about just linear regression (no ridge penalty)? Linear model with additive Gaussian noise sounds fine. Let's start with a fixed step size. (i.e. a fixed multiplier of the minibatch gradient)

vakobzar · 2017-01-22T22:51:01Z

David, Thank you! I think that we need the step size eta_t to converge to zero for the minibatch method to converge. When I run the minibatch code with fixed step size, it doesn't converge. It does converge when I try 1/t -- are you OK with this or perhaps I am misunderstanding something.

davidrosenberg · 2017-01-23T00:00:17Z

I think it's fine.

…

Sent from my iPhone

On Jan 22, 2017, at 5:51 PM, vakobzar ***@***.***> wrote: David, Thank you! I think that we need the step size eta_t to converge to zero for the minibatch method to converge. When I run the minibatch code with fixed step size, it doesn't converge. It does converge when I try 1/t -- are you OK with this or perhaps I am misunderstanding something. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

davidrosenberg added this to the Excess risk decomposition milestone Jan 15, 2017

davidrosenberg removed this from the Excess risk decomposition milestone Jan 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tie Estimation Error to Variance #30

Tie Estimation Error to Variance #30

davidrosenberg commented Jan 15, 2017

brett1479 commented Jan 15, 2017

vakobzar commented Jan 18, 2017

vakobzar commented Jan 20, 2017

davidrosenberg commented Jan 22, 2017

vakobzar commented Jan 22, 2017

davidrosenberg commented Jan 23, 2017 via email

Tie Estimation Error to Variance #30

Tie Estimation Error to Variance #30

Comments

davidrosenberg commented Jan 15, 2017

brett1479 commented Jan 15, 2017

vakobzar commented Jan 18, 2017

vakobzar commented Jan 20, 2017

davidrosenberg commented Jan 22, 2017

vakobzar commented Jan 22, 2017

davidrosenberg commented Jan 23, 2017 via email