Every single thing that a machine learning algorithm does maps one set of numbers to another set of numbers.
Given some input vector
- In linear regression you have
$\large{\color{Purple} \vec{x}}$ , you multiply by$\large{\color{Purple} w}$ and run it through a summation$\large{\color{Purple} \sum }$ and you get$\large{\color{Purple} \hat{y}}$ , this is linear regression.
- You take
$\large{\color{Purple} \vec{x}}$ , again the same parameters$\large{\color{Purple} w}$ , run it through a summation and we add a one small change, we add a nonlinear function. This is called a non-linear activation function and this gives our$\large{\color{Purple} \hat{y}}$ , this is called logistic regression for certain choices of activation functions. - An Activision function is on your linear combination you add nonlinearity over this ok, so we will typically denote the non-linear activation function by
$\large{\color{Purple} g}$ so$\large{\color{Purple} g()}$ stands for some non-linear function.
- More than a layer is called Deep network.
- So you take your
$\large{\color{Purple} \vec{x}}$ , run it through a linear combination with some weight$\large{\color{Purple} w}$ , run it through a nonlinear function$\large{\color{Purple} g()}$ . Then run it through another linear combination with some other weights let us call them$\large{\color{Purple} w_1}$ , some other weights$\large{\color{Purple} w_2}$ , and another non-linear combination$\large{\color{Purple} g()}$ and so on and so forth and finally you get your output prediction$\large{\color{Purple} \hat{y}}$ .
- how do we characterize the output
$\large{\color{Purple}y,\hat{y}}$ . - what is the feed forward model.
$\large{\color{Purple} \textit{ Which non-linear function as }\textbf{g()}?}$ - The 3rd thing is what is the loss function
$\large{\color{Purple}J}$ - How do we calculate
$\large{\color{Purple} \frac{\partial J}{\partial w}}$ ? in other words this is the gradient problem - There is a 5th problem which will not be discussing very much which is how do we use
$\large{\color{Purple} \frac{\partial J}{\partial w}}$ to find better w, Optimization problem.