By Akash Maurya, Information Technology, VESIT
Neural networks are a set of algorithms, modeled loosely after the human brain, that is designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling, or clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text, or time series, must be translated.
Importance Of Pruning
Deep Learning models these days require a significant amount of computing, memory, and power. Deploying the deep learning models on cutting edge devices like smartphones, raspberry pi, cars, etc would be difficult since it does not meet the computational power requirements of the deep learning model.
Pruning a model makes it :
- Smaller in Size
- More Memory-Efficient
- More Power-Efficient
- Faster at inference with Minimal Loss in Accuracy
I have tried to implement the filter pruning technique on VGG 16 architecture by using the clustering methodology. Clustering is basically a technique that groups similar data points such that the points in the same group are more similar to each other than the points in the other groups.
I have used two datasets :
- CIFAR 10: This dataset consists of 60,000 32×32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
- CIFAR 100: This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses.
VGG16 model trained on the ImageNet dataset is more than 500 MB. Table 1 shows the comparison of different CNN models in terms of features, parameter, FLOP, and accuracy. VGG16 model consists of:
- 13 Convolutional layers
- 5 pooling layers
- 3 fully connected or dense layers
It can be observed that VGG16 shows an accuracy of 90.1% with about 138 million parameters. Thus it can be concluded that VGG16 is a huge model and cannot be deployed in resource-constrained devices.
|Model||Size||Top-1 Accuracy||Top-5 Accuracy||Parameters||Depth|
Comparison of CNN models
I used Agglomerative Clustering for clustering the similar filters between the various layers using Cosine Similarity. The agglomerative clustering is the most common type of hierarchical clustering used to group objects in clusters based on their similarity
The horizontal axis shows the layers of the VGG16 model whereas the vertical axis represents the number of trainable parameters for the original and pruned model.
|Parameters||Original Model||Pruned Model|
|Forward/backward pass size (MB)||6.57||1.20|
|Params size (MB)||57.17||2.80|
|Estimated Total size (MB)||63.76||4.01|
Comparison of total trainable parameters and memory size
In order to check the inference time for the efficient performance of the model on a smartphone, I deployed the model on Android Studio. The difference in the inference time can be clearly seen in the image below
Pruning solves the challenge of compressing CNN models without compromising the model’s accuracy.
It helps us to deploy our model in various small devices that do not have enough space or huge computational power. The size of a model can be reduced by a huge margin.I successfully pruned the VGG16 model by reducing the number of trainable parameters from 14,987,772 parameters to 732,898 parameters thereby reducing the size of the model from 63.76 MB to 4.01 MB with constant accuracy of 93.6%. A significant change in the inference time can also be seen.