Walkthrough: AlexNet

Walkthrough: Training AlexNet on ImageNet using Minerva's owl.net Package

Contents:

About Minerva owl.net

owl.net is a DNN training framework build on Minerva's python interface owl. The main purposes of this package are: 1) Providing a simple way for Minerva users to train deep neural network for computer vision problems. 2) Providing a prototype about how to build user applications utilizing the advantages of Minerva.

We borrow Caffe's well-defined network and solver configure file format but the execution is conducted in Minerva engine. It's a showcase of Minerva's flexibile interface (building Caffe's main functionality in several hundreds of lines) and computation efficiency (Multi-GPU training).

About ImageNet

If you are not familiar with ImageNet Large Scale Visual Recognition Challenge, please see here. The classification task contains 1.28 million images belong to 1000 classes.

Data Preparation

To make IO efficient, we recommend transfer the original image into LMDB after you download the dataset. We could use the tool provided by Caffe to do the convertion. After converting the images, we need to compute the mean value of each pixel among the dataset. When training, mean values are subtracted from the image to produce a zero-mean input. Mean_file for ILSVRC12 can be downloaded by the script provided by Caffe.

About AlexNet

In ILSVRC2012, AlexNet was proposed. It's the winning model of ILSVRC2012 classification task and it achieved a large accuracy margin compared with the non-DNN methods. It contains 5 convolutional layers and 3 fully-connected layers. During training, some randomness is introduced in the data augmentation process and dropout layer. Those details are shown below and should be defined in the configure file provided by Caffe. Note that currently we don't support convolutions with more than one group, for one GPU released recently has enough RAM to hold the whole model.

Layer	1	2	3	4	5	6	7	8
Type	conv+max+norm	conv+max+norm	conv	conv	conv+max	full	full	full
Channels	96	256	384	384	256	4096	4096	1000
Filter Size	11*11	5*5	3*3	3*3	3*3	-	-	-
Convolution Stride	4*4	1*1	1*1	1*1	1*1	-	-	-
Pooling Size	3*3	3*3	-	-	3*3	-	-	-
Pooling Stride	2*2	2*2	-	-	2*2	-	-	-
Padding Size	2*2	1*1	1*1	1*1	1*1	-	-	-

Training AlexNet using Minerva

We implemented the DNN training logic in trainer.py. The main body of training code and updating code is below:

        for iteridx in range(s.snapshot * s.owl_net.solver.snapshot, s.owl_net.solver.max_iter):
            # train on multi-gpu
            for gpuid in range(s.num_gpu):
                owl.set_device(s.gpu[gpuid])
                s.owl_net.forward('TRAIN')
                s.owl_net.backward('TRAIN')
                for wid in wunits:
                    wgrad[gpuid].append(s.owl_net.units[wid].weightgrad)
                    bgrad[gpuid].append(s.owl_net.units[wid].biasgrad)

            # weight update
            for i in range(len(wunits)):
                wid = wunits[i]
                upd_gpu = i * s.num_gpu / len(wunits)
                owl.set_device(s.gpu[upd_gpu])
                for gid in range(s.num_gpu):
                    if gid == upd_gpu:
                        continue
                    wgrad[upd_gpu][i] += wgrad[gid][i]
                    bgrad[upd_gpu][i] += bgrad[gid][i]
                s.owl_net.units[wid].weightgrad = wgrad[upd_gpu][i]
                s.owl_net.units[wid].biasgrad = bgrad[upd_gpu][i]
                s.owl_net.update(wid)
'''

### Define Solver File
The information need to be defined through solver are:
* network configuration file
* snapshot saving directory
* max iteration
* testing interval
* test iteration
* snapshot saving interval
* learning rate tuning strategy
* momentum
* weight decay

AlexNet usually need traverse the training set 70-90 times before converging and the learning rate should be tune smaller several times. The standard solver for AlexNet could be found [here](https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/solver.prototxt).

### Call Training Script
User could use following command to start training given Caffe's solver under the [scripts folder](https://github.com/dmlc/minerva/tree/master/scripts)

./net_trainer.py <solver_file> [--snapshot SNAPSHOT] [-n NUM_GPU]
solver_file is the file name in Caffe's solver format.
SNAPSHOT is the index of the snapshot to start with (default: 0).
NUM_GPU is the number of gpu to use.

If we set NUM_GPU greater than 1, our code will slice a training batch into NUM_GPU pieces and FF/BP in parallel. The update is executed synchronously, so the training result using one gpu or NUM_GPU gpus will be the same.

The SNAPSHOT parameter will guide the system find the saved model under "snapshot saving directory" with that index. If the model can be found and the weight dimension is matched with the network configure file, OwlNet will load the model and continue training. Otherwise, it will initialize the weight according to the weight_filler parameter in configure file.

An example of call:
```bash
./net_trainer.py /path/to/solver_file --snapshot=0 -n=4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly