Walkthrough: MNIST

Walkthrough: MNIST using Minerva

About MNIST

If you are not familiar with MNIST dataset, please see here. Download the MNIST dataset mnist_all.mat.

Classify MNIST using 3 layer perceptron

Model

The network used here consists of

One input layer of size 784
One hidden layer of size 256; use RELU non-linearity
One classifier layer of size 10; use Softmax loss function

Algorithm step by step

Suppose the minibatch size is 256. For each minibatch, we have already converted them into two matrices: data and label. They are of size 784x256 and 10x256, respectively. Then,

Initialization: Weight and bias matrices are initialized as follows:

w1 = owl.randn([256, 784], 0.0, 0.01)
w2 = owl.randn([10, 256], 0.0, 0.01)
b1 = owl.zeros([256, 1])
b2 = owl.zeros([10, 1])

Feed-forward Propagation:

a1 = owl.elewise.relu(w1 * data + b1)  # hidden layer
a2 = owl.conv.softmax(w2 * data + b2)  # classifier layer

Backward Propagation:

s2 = a2 - label                                 # classifier layer
s1 = owl.elewise.relu_back(w2.trans() * s2, a1) # hidden layer
gw2 = s2 * a2.trans()                           # gradient of w2
gw1 = s1 * data.trans()                         # gradient of w1
gb2 = s2.sum(1)                                 # gradient of b2
gb1 = s1.sum(1)                                 # gradient of b1

Update:

w1 -= lr * gw1
w2 -= lr * gw2
b1 -= lr * gb1
b2 -= lr * gb2

Putting them together

import owl
import owl.conv
import owl.elewise
import mnist_io, sys
# initial system
owl.initialize(sys.argv)
gpu = owl.create_gpu_device(0)
owl.set_device(gpu)
# training parameters and weights
MAX_EPOCH=10
lr = 0.01
w1 = owl.randn([256, 784], 0.0, 0.01)
w2 = owl.randn([10, 256], 0.0, 0.01)
b1 = owl.zeros([256, 1])
b2 = owl.zeros([10, 1])
(train_set, test_set) = mnist_io.load_mb_from_mat("mnist_all", 256)
# training
for epoch in range(MAX_EPOCH):
  for (data_np, label_np) in train_set:
    data = owl.from_numpy(data_np)
    label = owl.from_numpy(label_np)
    # ff
    a1 = owl.elewise.relu(w1 * data + b1)  # hidden layer
    a2 = owl.conv.softmax(w2 * a1 + b2)    # classifier layer
    # bp
    s2 = a2 - label                                 # classifier layer
    s1 = owl.elewise.relu_back(w2.trans() * s2, a1) # hidden layer
    gw2 = s2 * a1.trans()                           # gradient of w2
    gw1 = s1 * data.trans()                         # gradient of w1
    gb2 = s2.sum(1)                                 # gradient of b2
    gb1 = s1.sum(1)                                 # gradient of b1
    # update
    w1 -= lr * gw1
    w2 -= lr * gw2
    b1 -= lr * gb1
    b2 -= lr * gb2
owl.wait_for_all()

Some explanations:

We've provided a function load_mb_from_mat to load minibatch in numpy.ndarray from .mat file in mnist_io module
To convert from numpy array to owl.NArray. You could use owl.from_numpy function.
- ATTENTION: Since Minerva uses fortran-style array (or column major array) while numpy uses C-style array (row major), when coverting numpy.ndarray to owl.NArray, the dimension will be reversed. Please ready the document about this function here.
Since Minerva uses lazy evaluation, most owl APIs are asynchronous. In the above example, if without the last owl.wait_for_all() call, the main thread will exit while the worker threads of Minerva are still computing in the backend. This will lead to fault and errors. To avoid this, add a blocking call at the end of the program. For more information about blocking call and non-blocking call, please see this wiki page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly