-
Notifications
You must be signed in to change notification settings - Fork 56
Description
Right now, if I have a function f(a, b, c)
and I only want to create a function which returns the gradient w.r.t. to a
and b
, I have two options:
∇f(a, b, c) = ReverseDiff.gradient((x, y) -> f(x, y, c), (a, b)))
∇f! = ReverseDiff.compile_gradient(f, (a, b, c))
, and just ignore thec
gradient that will pop out
The former has to re-record the function for every call, while the latter wastes some computation differentiating w.r.t. c
.
We should support something akin to Tensorflow's placeholders for the pre-recorded API, allowing you to drop in updatable parameters that aren't differentiated against. This can be accomplished by recording the tape as normal, and then "turning off" differentiation on the selected parameters (the idiom for that currently is to set the tape to NULL_TAPE
, but I'm going to play around with it). Some refactoring should probably be done to get the most out of this change performance-wise (e.g., allow the instantiation of a TrackedArray
with deriv == nothing
).
As for the API, I can think of two different paths we could take:
- Select which arguments are to be differentiated against using a
wrt
function, e.g.ReverseDiff.compile_gradient(f, (wrt(a), wrt(b), c))
- Select which arguments are not to be differentiated against using a
param
function, e.g.ReverseDiff.compile_gradient(f, (a, b, param(c)))