Alumette (match in French, as in a tiny torch) is a tiny neural network library with a reverse-mode automatic differentiation engine. It is roughly based on Karpathy's micrograd, but it aims to be a little more usable by wrapping Numpy arrays around Tensors and implementing other Tensor optimization gadgets.
*The logo was generated by DALL-E 2.
There have been a few very confusing times during which I could not get my Pytorch module to optimize. The gradients woud explode, the loss would blow up, or all sorts of weird things would happen during my backward pass. I believed Pytorch to be this magical framework that would optimize any neural network with any given optimizer. During these times I started to feel like I had to understand autograd and backpropagation in practice, as Andrej Karpathy has well explained on this Medium article.
But also because it is super fun to code and a nice freshener for calculus and linear algebra :)
Run python setup.py build && python setup.py install
in your environement, and you're ready to go!
I recommend Python 3.11 for the speed boost!
from alumette import Tensor
a = Tensor(3.2, requires_grad=True)
b = Tensor(-4, requires_grad=True)
((a*b - (b/a))**2).backward() # Compute gradients of all nodes that require grad
print(a.grad, b.grad) # Access node gradient
from alumette import Tensor
import numpy as np
a = Tensor(np.random.random((5, 2)), requires_grad=True) # From Numpy nd-array
b = Tensor([[0.1], [-1.5]], requires_grad=True) # Automatic nd-array creation from list
c = Tensor(np.random.random((5, 1)), requires_grad=True)
((a@b).T @ c).backward() # Compute gradients of all nodes that require grad
print(a.grad, b.grad, c.grad) # Access node gradient
from alumette.nn import Linear, NeuralNet, MSE, SGD
import random
class MyNet(NeuralNet):
def __init__(self) -> None:
super().__init__()
self.layer1 = Linear(1, 15, activation="relu")
self.layer2 = Linear(15, 1, activation="identity")
def forward(self, x):
y = self.layer1(x)
y = self.layer2(y)
return y
def test_func_1(x):
return 9 * x**3 + (3 * (x**2)) - (8 * x) + 3 / 4
nn = MyNet()
opt = SGD(nn.parameters(), lr=1e-5)
xs = [random.uniform(-1, 1) for _ in range(1000)]
for _ in range(100):
tot_loss = 0.0
opt.zero_grad()
random.shuffle(xs)
ys = [test_func_1(x) for x in xs]
for x, y in zip(xs, ys):
y_hat = nn(Tensor(x).unsqueeze(0))
loss = MSE(y_hat, Tensor(y))
tot_loss += loss
tot_loss.backward()
opt.step()
Have a look in examples/
for more!
- Karpathy's video course
- Karpathy's micrograd project
- Geohotz's Tinygrad
- Ari Seff's video introduction on automatic differentiation
- This very good PDF on Tensor derivatives
- Build autograd on scalars
- Build small neural network library
- Write neural net example
- Test gradients numerically
- Implement a Tensor class to wrap Numpy ndarrays
- Implement a neural net training example for 1D curve fitting
- Make grad a Tensor to allow for higher-order differentiation
- Implement batching
- Implement convolutions
- Implement a neural net training example for image classification (MNIST)
- GPU acceleration (PyCuda?)