A Structured Filter Pruning Approach for Efficient Inference of Deep Neural Networks

By Akash Maurya, Information Technology, VESIT


Neural networks are a set of algorithms, modeled loosely after the human brain, that is designed to recognize patterns. They interpret sensory data through a kind of machine perception, labeling, or clustering raw input. The patterns they recognize are numerical, contained in vectors, into which all real-world data, be it images, sound, text, or time series, must be translated.

Fig. Pictorial representation of Deep Neural Networks

Importance Of Pruning

Deep Learning models these days require a significant amount of computing, memory, and power. Deploying the deep learning models on cutting edge devices like smartphones, raspberry pi, cars, etc would be difficult since it does not meet the computational power requirements of the deep learning model. 

Pruning a model makes it :

  1. Smaller in Size
  2. More Memory-Efficient
  3. More Power-Efficient
  4. Faster at inference with Minimal Loss in Accuracy
Fig. Pruning of a model

Proposed Method

I have tried to implement the filter pruning technique on VGG 16 architecture by using the clustering methodology. Clustering is basically a technique that groups similar data points such that the points in the same group are more similar to each other than the points in the other groups.

I have used two datasets :

  1. CIFAR 10: This dataset consists of 60,000 32×32 color images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
  2. CIFAR 100: This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses.

VGG16 model trained on the ImageNet dataset is more than 500 MB. Table 1 shows the comparison of different CNN models in terms of features, parameter, FLOP, and accuracy. VGG16 model consists of:

  • 13 Convolutional layers
  • 5 pooling layers
  • 3 fully connected or dense layers

It can be observed that VGG16 shows an accuracy of 90.1% with about 138 million parameters. Thus it can be concluded that VGG16 is a huge model and cannot be deployed in resource-constrained devices.

ModelSizeTop-1 AccuracyTop-5 AccuracyParametersDepth
Xception88 MB0.7900.94522,910,480126
VGG16528 MB0.7130.901138,357,54423
VGG19549 MB0.7130.900143,667,24026

Comparison of CNN models 

I used Agglomerative Clustering for clustering the similar filters between the various layers using Cosine Similarity. The agglomerative clustering is the most common type of hierarchical clustering used to group objects in clusters based on their similarity

Fig. Block diagram of the proposed compression technique

Experimental Results

The horizontal axis shows the layers of the VGG16 model whereas the vertical axis represents the number of trainable parameters for the original and pruned model.

Fig. Graphical representation of trainable parameters for the original and compressed model
ParametersOriginal ModelPruned Model
Total params14,987,772732,898
Trainable params14,987,772732,898
Non-trainable params00
Input size0.010.01
Forward/backward pass size (MB)6.571.20
Params size (MB)57.172.80
Estimated Total size (MB)63.764.01

Comparison of total trainable parameters and memory size 

In order to check the inference time for the efficient performance of the model on a smartphone, I deployed the model on Android Studio. The difference in the inference time can be clearly seen in the image below

Fig. Comparison of inference time for original and pruned VGG16 model


Pruning solves the challenge of compressing CNN models without compromising the model’s accuracy. 

It helps us to deploy our model in various small devices that do not have enough space or huge computational power. The size of a model can be reduced by a huge margin.I successfully pruned the VGG16 model by reducing the number of trainable parameters from 14,987,772 parameters to 732,898 parameters thereby reducing the size of the model from 63.76 MB to 4.01 MB with constant accuracy of 93.6%. A significant change in the inference time can also be seen.

Leave a Reply

Your email address will not be published. Required fields are marked *