Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

fixed the gamma computation error #79

Merged
merged 2 commits into from
Apr 20, 2020
Merged

Conversation

doyend
Copy link
Contributor

@doyend doyend commented Mar 31, 2020

The old code computes the sum of second-order differentiation, which is wrong. This one fixed it.

@GPUtester
Copy link

Can one of the admins verify this patch?

@yidong72 yidong72 requested a review from avolkov1 March 31, 2020 21:05
@avolkov1 avolkov1 self-assigned this Mar 31, 2020
@avolkov1
Copy link
Contributor

avolkov1 commented Mar 31, 2020

Is the only code modification from:

drv = grad(loss_grads[0], inputs, torch.ones_like(loss_grads[0]) )

to:

drv = grad(loss_grads[0][0][2], inputs)

I'm looking at the function signature of grad:

torch.autograd.grad(outputs, inputs, grad_outputs=None,
    retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)

The change is very subtle. What are the components of loss_grads? It's not very clear what's contained in the loss_grads. So loss_grads[0] is the sum and loss_grads[0][0][2] is what then?

It looks good, but would be nice to clarify that pytorch operation.

@yidong72
Copy link
Collaborator

You can test it out by following code

import torch
from torch.autograd import grad
'''
z = (xy)^2
x = 3, y =2

first order deriv [24 36]
d2z/dx2 = 8
d2z/dxdy = 24
d2z/dy2 = 18
'''

inputs = torch.tensor([3.0,2.0], requires_grad=True)

z = (inputs[0]*inputs[1])**2
first_order_grad = grad(z, inputs, create_graph=True)

second_order_grad_original, = grad(first_order_grad[0], inputs,
                                   torch.ones_like(first_order_grad[0]), retain_graph=True) # Does not give expected answer
second_order_grad_x, = grad(first_order_grad[0][0], inputs, retain_graph=True) #
second_order_grad_y, = grad(first_order_grad[0][1], inputs)

the old code is to calcuate the sum of gradients with respect to the parameters, i.e. (d2z/dx2 + d2z/dxdy, d2z/dy2 + d2z/dxdy)

@avolkov1
Copy link
Contributor

That's an excellent simplified example.

In [2]: first_order_grad                                                                                                                                                                                           
Out[2]: (tensor([24., 36.], grad_fn=<AddBackward0>),)
In [3]: second_order_grad_original                                                                                                                                                                                 
Out[3]: tensor([32., 42.])
In [4]: second_order_grad_x                                                                                                                                                                                        
Out[4]: tensor([ 8., 24.])
In [5]: second_order_grad_y                                                                                                                                                                                        
Out[5]: tensor([24., 18.])

Now I understand a lot better what's going on. Could you add a comment explaining why you are specifying the access of the arrays using index 2?

# input tensor is 'K, B, S0, sigma, mu, r' zero-based indexed.
inputs = torch.tensor([[110.0, 100.0, 120.0, 0.35, 0.1, 0.05]]).cuda()
inputs.requires_grad = True
x = model(inputs)

# instead of using loss.backward(), use torch.autograd.grad() to compute gradients
# https://pytorch.org/docs/stable/autograd.html#torch.autograd.grad
loss_grads = grad(x, inputs, create_graph=True)
# loss_grads[0][0][0] is 1st order derivative with respect to K
# loss_grads[0][0][1] is 1st order derivative with respect to B etc.

# S0 index is 2 therefore
# loss_grads[0][0][2] is 1st order derivative with respect to S0
drv = grad(loss_grads[0][0][2], inputs)
# drv[0][0][2] is 2nd order derivative with respect to S0

I don't know of a good succinct comment to explain it, but at least pointing out that S0 corresponds to index 2 of the inputs tensor and that's why you're indexing with 2. Maybe having a separate cell in the notebook demonstrating how torch.autograd.grad works with the example you gave above z = (xy)^2.

@yidong72
Copy link
Collaborator

I added the example into the notebook

Copy link
Contributor

@avolkov1 avolkov1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks.

@avolkov1 avolkov1 merged commit 27ae2c9 into NVIDIA:develop Apr 20, 2020
@yidong72 yidong72 deleted the bug-fix branch April 20, 2020 23:09
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants