Tensorflow based implementation of Learning both Weights and Connections for Efficient Neural Networks by Han S., Pool J., et al.
Pruning is a Model Compression Technique which allows the user to compress the model to a smaller size while maintaining marginal loss in accuracy. Pruning also allows the model to be optimized for real time inference for resource-constrained devices.
For more information on Model Compression and Pruning, please read Model Compression via Pruning.
- Magnitude Based Pruning.
This implementation utilizes a dataset which is not available for public usage. But this implementation can be utilized on other datasets.
Code has two different implementations:
-
Retrain Attempt: Inducing sparsity every iteration while retraining.
-
Baseline Attempt: Inducing sparsity by making the weight values beyond a certain threshold equal to 0.0 without retraining.
Author @Parth Malpathak
All the codes and implementatations are a part of 10605 (Machine Learning for Large Datasets) course requirements. Please go through the academic integrity policy of Carnegie Mellon University before cloning this repository and duplicating the codes.