Skip to content

This repo compares five different models and analyses their convergence and training for MNIST digit classification dataset.

Notifications You must be signed in to change notification settings

Momilijaz96/PyTorch-Model-Architecture-Tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pytorch Model Architecture Tuning

This repo tries to evalute the impact of varying convolutional network architectures on network's performance. We compares five different models and analyses their convergence and training for MNIST digit classification dataset.

Quick SetUp

All 5 network architectures are implemented in ConvNet.py From train_evaluate_CNN.py simply give model's number and other hyper params to run a desired model with custom params.
python3 train_evaluate_CNN.py --mode <Mode between 1-5> --learning_rate <LR> --batch_size <batch_size>  --log_dir <output dir>

Steps to create Models

  1. Create a fully connected (FC) hidden layer (with 100 neurons) with sigmoid activation function. Train it with SGD with a learning rate of 0.1 (a total of 60 epoch), a mini-batch size of 10, and no regularization.
  2. Now insert two convolutional layers to the network built in STEP 1 (and put pooling layer too for each convolutional layer). Pool over 2x2 regions, 40 kernels, stride =1, with kernel size of 5x5.
  3. For the network depicted in STEP 2, replace Sigmoid with ReLU, and train the model with new learning rate (=0.03). Re-train the system with this setting.
  4. Add another fully connected (FC) layer now (with 100 neurons) to the network built in STEP 3. (remember that the first FC was put in STEP 1, here you are putting just another FC).
  5. Change the neurons numbers in FC layers into 1000. For regularization, use Dropout (with a rate of 0.5). Train the whole system using 40 epochs.

Models' Performance

For mode 1 the loss descended for each epoch and accuracy got better, as expected but for every other mode which had an added convolution layer didn’t converged. For different optimisers and loss functions, mode 2-5 didn’t let the model converge and the loss and accuracy didn’t show improvement as the model was trained with the given settings.

Conclusion: MNIST is a simple dataset and single fully connected layer with sigmoid does good but the other complex architectures were not converging because of a possible bug in implementation or some other logical cause.

About

This repo compares five different models and analyses their convergence and training for MNIST digit classification dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages