torchzero is a general purpose optimization library with a highly modular design. There are many algorithms implemented in torchzero, including first and second order algorithms, Quasi-Newton methods, conjugate gradient, gradient approximations, line searches, trust regions, etc. The modularity allows, for example, to use Newton or any Quasi-Newton method with any trust region or line search.
note: This project is being actively developed, there may be API changes.
An overview of the modules available in torchzero is available on the docs.
Construct a modular optimizer and use like any other pytorch optimizer, although some modules require a closure as detailed in the next section.
optimizer = tz.Modular(
model.parameters(),
tz.m.ClipValue(1),
tz.m.Adam(),
tz.m.WeightDecay(1e-2),
tz.m.LR(1e-1)
)
Here is what happens:
-
The gradient is passed to the
ClipValue(1)
module, which returns gradient with magnitudes clipped to be no larger than 1. -
Clipped gradient is passed to
Adam()
, which updates Adam momentum buffers and returns the Adam update. -
The Adam update is passed to
WeightDecay()
which adds a weight decay penalty to the Adam update. Since we placed it after Adam, the weight decay is decoupled. By movingWeightDecay()
beforeAdam()
, we can get coupled weight decay. -
Finally the update is passed to
LR(0.1)
, which multiplies it by the learning rate of 0.1.
Certain modules, particularly line searches and gradient approximations require a closure, similar to L-BFGS in PyTorch. Also some modules require closure to accept an additional backward
argument, refer to example below:
model = nn.Sequential(nn.Linear(10, 10), nn.ELU(), nn.Linear(10, 1))
inputs = torch.randn(100,10)
targets = torch.randn(100, 1)
optimizer = tz.Modular(
model.parameters(),
tz.m.CubicRegularization(tz.m.Newton()),
)
for i in range(1, 51):
def closure(backward=True):
preds = model(inputs)
loss = F.mse_loss(preds, targets)
# If backward=True, closure should call
# optimizer.zero_grad() and loss.backward()
if backward:
optimizer.zero_grad()
loss.backward()
return loss
loss = optimizer.step(closure)
if i % 10 == 0:
print(f"step: {i}, loss: {loss.item():.4f}")
The code above will also work with any other optimizer because all PyTorch optimizers and most custom ones support closure, so there is no need to rewrite training loop.
Rosenbrock minimization example:
import torch
import torchzero as tz
def rosen(x, y):
return (1 - x) ** 2 + 100 * (y - x ** 2) ** 2
X = torch.tensor([-1.1, 2.5], requires_grad=True)
def closure(backward=True):
loss = rosen(*X)
if backward:
X.grad = None # same as opt.zero_grad()
loss.backward()
return loss
opt = tz.Modular([X], tz.m.NewtonCGSteihaug(hvp_method='forward'))
for step in range(24):
loss = opt.step(closure)
print(f'{step} - {loss}')
More information and examples along with visualizations and explanations of many of the algorithms implemented in torchzero are available on the wiki
torchzero can be installed from The Python Package Index:
pip install torchzero
Alternatively install it directly from this repo:
pip install git+https://github.com/inikishev/torchzero