This repository has been archived by the owner on Oct 15, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 172
Walkthrough: MNIST
Minjie Wang edited this page Apr 19, 2015
·
14 revisions
Contents:
- About MNIST
- Classify MNIST using 3 layer perceptron
- Classify MNIST using convolution neural network
If you are not familiar with MNIST dataset, please see here. Download the MNIST dataset mnist_all.mat.
The network used here consists of
- One input layer of size 784
- One hidden layer of size 256; use RELU non-linearity
- One classifier layer of size 10; use Softmax loss function
Suppose the minibatch size is 256. For each minibatch, we have already converted them into two matrices: data
and label
. They are of size 784x256 and 10x256, respectively. Then,
-
Initialization: Weight and bias matrices are initialized as follows:
w1 = owl.randn([256, 784], 0.0, 0.01) w2 = owl.randn([10, 256], 0.0, 0.01) b1 = owl.zeros([256, 1]) b2 = owl.zeros([10, 1])
-
Feed-forward Propagation:
a1 = owl.elewise.relu(w1 * data + b1) # hidden layer a2 = owl.conv.softmax(w2 * data + b2) # classifier layer
-
Backward Propagation:
s2 = a2 - label # classifier layer s1 = owl.elewise.relu_back(w2.trans() * s2, a1) # hidden layer gw2 = s2 * a2.trans() # gradient of w2 gw1 = s1 * data.trans() # gradient of w1 gb2 = s2.sum(1) # gradient of b2 gb1 = s1.sum(1) # gradient of b1
-
Update:
w1 -= lr * gw1 w2 -= lr * gw2 b1 -= lr * gb1 b2 -= lr * gb2
import owl
import owl.conv as co
import owl.elewise as ele
import mnist_io, sys
# initial system
owl.initialize(sys.argv)
gpu = owl.create_gpu_device(0)
owl.set_device(gpu)
# training parameters and weights
lr = 0.01 / 256
w1 = owl.randn([256, 784], 0.0, 0.01)
w2 = owl.randn([10, 256], 0.0, 0.01)
b1 = owl.zeros([256, 1])
b2 = owl.zeros([10, 1])
(train_set, test_set) = mnist_io.load_mb_from_mat("mnist_all", 256)
# training
count = 1
for epoch in range(10):
for (data_np, label_np) in train_set:
count += 1
data = owl.from_numpy(data_np)
label = owl.from_numpy(label_np)
# ff
a1 = ele.relu(w1 * data + b1) # hidden layer
a2 = co.softmax(w2 * a1 + b2) # classifier layer
# bp
s2 = a2 - label # classifier layer
s1 = ele.relu_back(w2.trans() * s2, a1) # hidden layer
gw2 = s2 * a1.trans() # gradient of w2
gw1 = s1 * data.trans() # gradient of w1
gb2 = s2.sum(1) # gradient of b2
gb1 = s1.sum(1) # gradient of b1
# update
w1 -= lr * gw1
w2 -= lr * gw2
b1 -= lr * gb1
b2 -= lr * gb2
# print accuracy
if count % 20 == 0:
pred = a2.argmax(0)
truth = label.argmax(0)
print "Accuracy: ", float((pred - truth).count_zero()) / 256
owl.wait_for_all()
- To run the above code
- Copy and save it to
/path/to/minerva/owl/apps/mnist
as for examplesimple_mnist.py
. - Download
mnist_all.mat
into the save folder. -
python simple_mnist.py
.
- Copy and save it to
- We've provided a function
load_mb_from_mat
to load minibatch innumpy.ndarray
from.mat
file inmnist_io
module - To convert from
numpy
array toowl.NArray
. You could useowl.from_numpy
function.-
ATTENTION: Since Minerva uses fortran-style array (or column major array) while
numpy
uses C-style array (row major), when covertingnumpy.ndarray
toowl.NArray
, the dimension will be reversed. Please ready the document about this function here.
-
ATTENTION: Since Minerva uses fortran-style array (or column major array) while
- Since Minerva uses lazy evaluation, most
owl
APIs are asynchronous. In the above example, if without the lastowl.wait_for_all()
call, the main thread will exit while the worker threads of Minerva are still computing in the backend. This will lead to fault and errors. To avoid this, add a blocking call at the end of the program. For more information about blocking call and non-blocking call, please see this wiki page.
- One input layer of size 28x28
- One convolution layer:
- kernel: 5x5
- stride: 1x1
- num_filters: 16
- One pooling layer:
- window: 2x2
- stride: 2x2
- One convolution layer:
- kernel: 5x5
- stride: 1x1
- num_filters: 32
- padding: 2x2
- One pooling layer:
- window: 3x3
- stride: 3x3
- One classifier layer (softmax loss) of size 10
- Weight format for convolution:
[kernel_width, kernel_height, in_channel, out_channel]
- Bias format for convolution:
[num_channels]
- ATTENTION: different from fully connected layer, see example below.
- Data format for convolution:
[image_width, image_height, num_channels, batch_size]
Suppose the minibatch size is 256. For each minibatch, we have already converted them into two ndarrays: data
and label
. They are of size 28x28x1x256 and 10x256, respectively. Then,
-
Initialization:
w1 = owl.randn([5, 5, 1, 16], 0, 0.01) w2 = owl.randn([5, 5, 16, 32], 0, 0.01) w3 = owl.randn([10, 512], 0, 0.01) b1 = owl.randn([16]) b2 = owl.randn([32]) # bias for convolution b3 = owl.randn([10, 1]) # bias for fully connection conv1 = owl.conv.Convolver(pad_h=0, pad_w=0, stride_v=1, stride_h=1) conv2 = owl.conv.Convolver(pad_h=2, pad_w=2, stride_v=1, stride_h=1) pool1 = owl.conv.Pooler(h=2, w=2, stride_v=2, stride_h=2) pool2 = owl.conv.Pooler(h=3, w=3, stride_v=3, stride_h=3)
-
owl.conv.Convoler
andowl.conv.Pooler
are two classes provided inowl.conv
module
-
-
Feed-forward Propagation:
a1 = owl.elewise.relu(conv1.ff(data, w1, b1)) a2 = pool1.ff(a1) a3 = owl.elewise.relu(conv2.ff(a2, w2, b2)) a4 = pool2.ff(a3) a5 = owl.conv.softmax(w3 * a4.reshape([512, 256]) + b3)
-
Backward Propagation:
s5 = a5 - label s4 = (w3.trans() * s5).reshape(a4.shape) s3 = owl.elewise.relu_back(pool2.bp(s4, a4, a3), a3) s2 = conv2.bp(s3, a2, w2) s1 = owl.elewise.relu_back(pool1.bp(s2, a2, a1), a1) # gradient gw3 = s5 * a4.reshape([512, 256]).trans() gb3 = s5.sum(1) gw2 = conv2.weight_grad(s3, a2, w2) gb2 = conv2.bias_grad(s3) gw1 = conv1.weight_grad(s1, data, w1) gb1 = conv1.bias_grad(s1)
-
Update: The same as in MLP example.