Name		Name	Last commit message	Last commit date
parent directory ..
Makefile		Makefile
README.md		README.md
comm_file.py		comm_file.py
create_mnist_netcdf.py		create_mnist_netcdf.py
mnist.patch		mnist.patch
mnist_images.nc		mnist_images.nc
pnetcdf_io.py		pnetcdf_io.py

README.md

MNIST example using PnetCDF-Python to Read Input Data

This directory contains files for running the Pytorch example program, MNIST, using Pytorch module DistributedDataParallel for parallel training and PnetCDF-Python for reading data from a NetCDF files.

Running the MNIST Example Program

Firstly, run command below to generate the python program file.
```
make mnist_main.py
```

Run command below to train the model using 4 MPI processes.

mpiexec -n 4 python mnist_main.py --batch-size 4 --test-batch-size 2 --epochs 3 --input-file mnist_images.nc

mnist_main.py command-line options

-h, --help            show this help message and exit
--batch-size N        input batch size for training (default: 64)
--test-batch-size N   input batch size for testing (default: 1000)
--epochs N            number of epochs to train (default: 14)
--lr LR               learning rate (default: 1.0)
--gamma M             Learning rate step gamma (default: 0.7)
--no-cuda             disables CUDA training
--no-mps              disables macOS GPU training
--dry-run             quickly check a single pass
--seed S              random seed (default: 1)
--log-interval N      how many batches to wait before logging training status
--save-model          For Saving the current Model
--input-file INPUT_FILE
                      NetCDF file storing train and test samples

Testing

Command make check will do the following.
- Downloads the python source codes main.py from Pytorch Examples as file mnist_main.py.
- Applies patch file mnist.patch to mnist_main.py.
- Run the training program mnist_main.py in parallel using 4 MPI processes.

Testing output shown on screen.

=====================================================================
    examples/MNIST: Parallel testing on 4 MPI processes
======================================================================
Train Epoch: 1 [0/60 (0%)]	Loss: 2.514259
Train Epoch: 1 [10/60 (67%)]	Loss: 1.953820

Test set: Average loss: 2.2113, Accuracy: 4/12 (33%)

Train Epoch: 2 [0/60 (0%)]	Loss: 2.359334
Train Epoch: 2 [10/60 (67%)]	Loss: 2.092178

Test set: Average loss: 1.4825, Accuracy: 6/12 (50%)

Train Epoch: 3 [0/60 (0%)]	Loss: 2.067438
Train Epoch: 3 [10/60 (67%)]	Loss: 0.010670

Test set: Average loss: 1.2531, Accuracy: 7/12 (58%)

Generate the Input NetCDF File From MNIST Datasets

Utility program create_mnist_netcdf.py can be used to extract a subset of images into a NetCDF file.
Command make mnist_images.nc will first download the MNIST data files from https://yann.lecun.com/exdb/mnist and extract 60 images as training samples and 12 images as testing samples into a new file named mnist_images.nc.
create_mnist_netcdf.py can also run individually to extract a different number of images using command-line options shown below.

create_mnist_netcdf.py command-line options:

  -h, --help            show this help message and exit
  --verbose             Verbose mode
  --train-size N        Number of training samples extracted from the input file (default: 60)
  --test-size N         Number of testing samples extracted from the input file (default: 12)
  --train-data-file TRAIN_DATA_FILE
                        (Optional) input file name of training data
  --train-label-file TRAIN_LABEL_FILE
                        (Optional) input file name of training labels
  --test-data-file TEST_DATA_FILE
                        (Optional) input file name of testing data
  --test-label-file TEST_LABEL_FILE
                        (Optional) input file name of testing labels
  --out-file OUT_FILE   (Optional) output NetCDF file name

The NetCDF file metadata can be obtained by running command "ncmpidump -h" or "ncdump -h".

% ncmpidump -h mnist_images.nc
netcdf mnist_images {
// file format: CDF-5 (big variables)
dimensions:
    height = 28 ;
    width = 28 ;
    train_num = 60 ;
    test_num = 12 ;
variables:
    ubyte train_samples(train_num, height, width) ;
  	  train_samples:long_name = "training data samples" ;
    ubyte train_labels(train_num) ;
  	  train_labels:long_name = "labels of training samples" ;
    ubyte test_samples(test_num, height, width) ;
  	  test_samples:long_name = "testing data samples" ;
    ubyte test_labels(test_num) ;
  	  test_labels:long_name = "labels of testing samples" ;

// global attributes:
  	  :url = "https://yann.lecun.com/exdb/mnist/" ;
}

Files in this directory

mnist.patch -- a patch file to be applied on main.py once downloaded from Pytorch Examples before running the model training.
comm_file.py -- implements the parallel environment for training the model in parallel.
pnetcdf_io.py -- implements the file I/O using PnetCDF-Python.
create_mnist_netcdf.py -- a utility python program that reads the MINST files, extract a subset of the samples, and stores them into a newly created file in NetCDF format.

Notes:

The test set accuracy may vary slightly depending on how the data is distributed across the MPI processes.
The accuracy and loss reported after each epoch are averaged across all MPI processes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNIST

MNIST

README.md

MNIST example using PnetCDF-Python to Read Input Data

Running the MNIST Example Program

Testing

Generate the Input NetCDF File From MNIST Datasets

Files in this directory

Notes:

Files

MNIST

Directory actions

More options

Directory actions

More options

Latest commit

History

MNIST

Folders and files

parent directory

README.md

MNIST example using PnetCDF-Python to Read Input Data

Running the MNIST Example Program

Testing

Generate the Input NetCDF File From MNIST Datasets

Files in this directory

Notes: