Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Would you mind explaining an issue about gradient descent in lecture 1b #10

Open
theanhle opened this issue Mar 2, 2017 · 1 comment

Comments

@theanhle
Copy link

theanhle commented Mar 2, 2017

  • I've read your slides in lecture 1b (Deep neural network are our friends). In slide: "Gradient are our friends" explaining arg min C(w, b): w0, b0 = 2, 2; C(w0, b0) = 68. This's correct. But after that, I don't understand why the results of expression sum(-2(y^ - y)*x) are: 8, -40, -72. I think that: -8, 40, 72 are correct.
  • By the way, I implemented this simple network but when I trained it through 100 times, the value of cost function was not convergent. Here is my code:
import numpy as np 
x=np.array([1,5,6])
y=np.array([0,16,20])
w = 2
b = 2
epoches = 101
learning_rate = 0.05
for epoch in range(epoches):
    out = x*w + b
    cost = np.sum((y - out)**2) 
    if(epoch % 10 ==0):
        print('Epoch:', epoch, ', cost:', cost)
    dcdw = np.sum(-2*(out - y)*x)
    dcdb = np.sum(-2*(out - y))
    w = w - learning_rate*dcdw
    b = b - learning_rate*dcdb

, and here is result:
Epoch: 0 , cost: 68
Epoch: 10 , cost: 1.1268304493e+19
Epoch: 20 , cost: 3.00027905999e+36
Epoch: 30 , cost: 7.98849058743e+53
Epoch: 40 , cost: 2.12700154184e+71
Epoch: 50 , cost: 5.66331713039e+88
Epoch: 60 , cost: 1.50790492101e+106
Epoch: 70 , cost: 4.01492128811e+123
Epoch: 80 , cost: 1.06900592505e+141
Epoch: 90 , cost: 2.84631649237e+158
Epoch: 100 , cost: 7.57855254577e+175

Please explain for me. Thank you in advance!

@mleue
Copy link

mleue commented Mar 23, 2017

Hey, two issues here.

First: your gradient calculation is off. When you define the cost as (y - out)**2 then the derivative w.r.t. w will be -2*(y - out)*x and not -2*(out - y)*x. So it seems like you just mixed it up there. Same issue for your gradient w.r.t. b.

Second: Diverging cost is usually a sign for a too-high learning rate. Try something lower. Go in steps of dividing by 10.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants