Skip to content
This repository has been archived by the owner on Oct 15, 2019. It is now read-only.

Walkthrough: MNIST

Minjie Wang edited this page Apr 19, 2015 · 14 revisions

Walkthrough: MNIST using Minerva

Contents:

  • About MNIST
  • Classify MNIST using 3 layer perceptron
  • Classify MNIST using convolution neural network

About MNIST

If you are not familiar with MNIST dataset, please see here. Download the MNIST dataset mnist_all.mat.


Classify MNIST using 3 layer perceptron

Model

The network used here consists of

  • One input layer of size 784
  • One hidden layer of size 256; use RELU non-linearity
  • One classifier layer of size 10; use Softmax loss function

Algorithm step by step

Suppose the minibatch size is 256. For each minibatch, we have already converted them into two matrices: data and label. They are of size 784x256 and 10x256, respectively. Then,

  1. Initialization: Weight and bias matrices are initialized as follows:

    w1 = owl.randn([256, 784], 0.0, 0.01)
    w2 = owl.randn([10, 256], 0.0, 0.01)
    b1 = owl.zeros([256, 1])
    b2 = owl.zeros([10, 1])
  2. Feed-forward Propagation:

    a1 = owl.elewise.relu(w1 * data + b1)  # hidden layer
    a2 = owl.conv.softmax(w2 * data + b2)  # classifier layer
  3. Backward Propagation:

    s2 = a2 - label                                 # classifier layer
    s1 = owl.elewise.relu_back(w2.trans() * s2, a1) # hidden layer
    gw2 = s2 * a2.trans()                           # gradient of w2
    gw1 = s1 * data.trans()                         # gradient of w1
    gb2 = s2.sum(1)                                 # gradient of b2
    gb1 = s1.sum(1)                                 # gradient of b1
  4. Update:

    w1 -= lr * gw1
    w2 -= lr * gw2
    b1 -= lr * gb1
    b2 -= lr * gb2

Putting them together

import owl
import owl.conv as co
import owl.elewise as ele
import mnist_io, sys
# initial system
owl.initialize(sys.argv)
gpu = owl.create_gpu_device(0)
owl.set_device(gpu)
# training parameters and weights
lr = 0.01 / 256
w1 = owl.randn([256, 784], 0.0, 0.01)
w2 = owl.randn([10, 256], 0.0, 0.01)
b1 = owl.zeros([256, 1])
b2 = owl.zeros([10, 1])
(train_set, test_set) = mnist_io.load_mb_from_mat("mnist_all", 256)
# training
count = 1
for epoch in range(10):
  for (data_np, label_np) in train_set:
    count += 1
    data = owl.from_numpy(data_np)
    label = owl.from_numpy(label_np)
    # ff
    a1 = ele.relu(w1 * data + b1)  # hidden layer
    a2 = co.softmax(w2 * a1 + b2)    # classifier layer
    # bp
    s2 = a2 - label                                 # classifier layer
    s1 = ele.relu_back(w2.trans() * s2, a1) # hidden layer
    gw2 = s2 * a1.trans()                           # gradient of w2
    gw1 = s1 * data.trans()                         # gradient of w1
    gb2 = s2.sum(1)                                 # gradient of b2
    gb1 = s1.sum(1)                                 # gradient of b1
    # update
    w1 -= lr * gw1
    w2 -= lr * gw2
    b1 -= lr * gb1
    b2 -= lr * gb2
    # print accuracy
    if count % 20 == 0:
      pred = a2.argmax(0)
      truth = label.argmax(0)
      print "Accuracy: ", float((pred - truth).count_zero()) / 256
owl.wait_for_all()
  • To run the above code
    1. Copy and save it to /path/to/minerva/owl/apps/mnist as for example simple_mnist.py.
    2. Download mnist_all.mat into the save folder.
    3. python simple_mnist.py.
  • We've provided a function load_mb_from_mat to load minibatch in numpy.ndarray from .mat file in mnist_io module
  • To convert from numpy array to owl.NArray. You could use owl.from_numpy function.
    • ATTENTION: Since Minerva uses fortran-style array (or column major array) while numpy uses C-style array (row major), when coverting numpy.ndarray to owl.NArray, the dimension will be reversed. Please ready the document about this function here.
  • Since Minerva uses lazy evaluation, most owl APIs are asynchronous. In the above example, if without the last owl.wait_for_all() call, the main thread will exit while the worker threads of Minerva are still computing in the backend. This will lead to fault and errors. To avoid this, add a blocking call at the end of the program. For more information about blocking call and non-blocking call, please see this wiki page.

Classify MNIST using Convolution Neural Network

Model

  • One input layer of size 28x28
  • One convolution layer:
    • kernel: 5x5
    • stride: 1x1
    • num_filters: 16
  • One pooling layer:
    • window: 2x2
    • stride: 2x2
  • One convolution layer:
    • kernel: 5x5
    • stride: 1x1
    • num_filters: 32
    • padding: 2x2
  • One pooling layer:
    • window: 3x3
    • stride: 3x3
  • One classifier layer (softmax loss) of size 10

Convolution ndarray format

  • Weight format for convolution: [kernel_width, kernel_height, in_channel, out_channel]
  • Bias format for convolution: [num_channels]
    • ATTENTION: different from fully connected layer, see example below.
  • Data format for convolution: [image_width, image_height, num_channels, batch_size]

Algorithm step by step

Suppose the minibatch size is 256. For each minibatch, we have already converted them into two ndarrays: data and label. They are of size 28x28x1x256 and 10x256, respectively. Then,

  1. Initialization:

    w1 = owl.randn([5, 5, 1, 16], 0, 0.01)
    w2 = owl.randn([5, 5, 16, 32], 0, 0.01)
    w3 = owl.randn([10, 512], 0, 0.01)
    b1 = owl.randn([16])
    b2 = owl.randn([32])     # bias for convolution
    b3 = owl.randn([10, 1])  # bias for fully connection
    conv1 = owl.conv.Convolver(pad_h=0, pad_w=0, stride_v=1, stride_h=1)
    conv2 = owl.conv.Convolver(pad_h=2, pad_w=2, stride_v=1, stride_h=1)
    pool1 = owl.conv.Pooler(h=2, w=2, stride_v=2, stride_h=2)
    pool2 = owl.conv.Pooler(h=3, w=3, stride_v=3, stride_h=3)
    • owl.conv.Convoler and owl.conv.Pooler are two classes provided in owl.conv module
  2. Feed-forward Propagation:

    a1 = owl.elewise.relu(conv1.ff(data, w1, b1))
    a2 = pool1.ff(a1)
    a3 = owl.elewise.relu(conv2.ff(a2, w2, b2))
    a4 = pool2.ff(a3)
    a5 = owl.conv.softmax(w3 * a4.reshape([512, 256]) + b3)
  3. Backward Propagation:

     s5 = a5 - label
     s4 = (w3.trans() * s5).reshape(a4.shape)
     s3 = owl.elewise.relu_back(pool2.bp(s4, a4, a3), a3)
     s2 = conv2.bp(s3, a2, w2)
     s1 = owl.elewise.relu_back(pool1.bp(s2, a2, a1), a1)
     # gradient
     gw3 = s5 * a4.reshape([512, 256]).trans()
     gb3 = s5.sum(1)
     gw2 = conv2.weight_grad(s3, a2, w2)
     gb2 = conv2.bias_grad(s3)
     gw1 = conv1.weight_grad(s1, data, w1)
     gb1 = conv1.bias_grad(s1)
  4. Update: The same as in MLP example.

Clone this wiki locally