Making Deep Neural Networks Ready For Mobile/Edge Devices

By Devdatta Khoche, Information Technology, VESIT


With the advancements in the recent era of Deep Learning and AI, comes the need of having the state-of-the-art models on the edge devices, without loss in their originality and accuracy. This can be achieved by deploying these models on the cloud and using APIs, but this comes with a great cost of security and speed.

Deep Learning models are becoming larger in size and computationally expensive, with integration of more technologies into them which can’t fit into the frugal memory of edge devices. Deploying the state-of-the-art DNNs in edge devices is limited by the following factors:

    —  Computational power needed for inference and

    —  Size of the model itself, hence quite not portable.

Proposed Method

We propose the redundant filter pruning method, which removes the redundant filters from each  convolutional layer, and thus reduces the size of the model and the computations needed for the inference.

So lets get started.As we know that for deployment we need to reduce the computations and size of the model , we use filter pruning for removing redundant filters .

But with filter pruning we need to keep in mind that we have to reduce filter and parameters without degrading the accuracy of the model.This can be done with unsupervised learning .We will be using clustering algorithms for finding similar filter based on different metrics.After we find the similar filters we will remove the all filters except one which will help in reduction of the flops , parameters and the size of the operation.We can also perform L1 norm and for reduction of parameters and size , but this can reduce the accuracy of the model drastically.We will use Pytorch for our implementation.PyTorch is an open source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook’s AI Research lab.


We will use cifar 10 dataset with Vgg16 for this post.

You can find it here!!


In unstructured pruning we find the less important connections and remove them. Here we use L1-norm to select less important filters and prune them. The finetuning part is the same as with any other methods. 

The accuracy decreases when the filters having large L1-norms are removed, which indicates that the filters with larger L1-norms are of more importance.

Algorithm 1:

  1. For layer in model do
  2. Get conv layer filter weights
  3. For each filter, calculate the sum of its absolute weights(S).
  4. Sort S
  5. Prune m* filters with the smallest sum values and their corresponding feature maps.
  6. The remaining kernel weights are copied to the new model.
  7. We’ll finetune the pruned model

K means is a very popular unsupervised learning algorithm.It is easy to implement as well it also gives good clustering within less time compared to HDBSCAN and other clustering algorithms.

The limitations of KMeans clustering are the number of clusters to be mentioned and the randomly selected centroid by the algorithm. 

Now to overcome these two problem we can use these methods

  1. To find the number of optimal clusters we can use the elbow method .

One limitation of the K-means algorithm is that it’s too sensitive to the initialisation of the centroids which may be also said because the mean points. So, if a centroid is initialised to be a “far-off” point, then it’d just find itself with no points related to it, and at an equivalent time, many clusters might find themselves linked with one centroid. Similarly, quite one centroids could be initialized into an equivalent cluster leading to poor clustering.

A poor initialisation of  the particular centroids resulted in bad clustering.

For our work we used clusters with half the number of out  channels that is half the number of filters.
This selection of clusters reduces the parameters and flops for the model drastically but it comes with the accuracy drop of 2.6 %.

After we have  formed the clusters we , considering our cluster ot N which are exactly equal to half the output channels , considering output channels for ith layer as Ki 

Then the cluster formed would equal to N = Ki/2.

Algorithm 1:Set K as Output_channels/2Apply K Means with number of cluster as KAfter the clusters are formed: – Keep anyone filter from that particular cluster and remove the randomlyCreate a configuration mask which denotes which filter to keep and which filter to remove in binary format.Re-initialize the weights with a new configuration.Fine tune the model.

Now here we have done random reduction in the filters and by hardcoding the cluster.

We can improve this by finding the optimal number of clusters and pruning the model which would probably give great results .

After this you can apply the new configuration to the model , and this will generate a pruned model which needs to be trained and fine tuned again for getting the accuracy.As now we have our fine tuned model we can jump on the deployment part .

For deployment instead of creating the whole app again we have edited the demo app which is made with pytorch mobile .Here is the github repo for the code of pytorch demo app.


Original Cifar 10 model VS Pruned model parameters
Results for cifar 10 and cifar 100


Modern CNNs often have large training and inference costs.In thisblog we present a method to prune filters with high similarity coefficient to produce CNNs with reduced computation costs without introducing any loss in the accuracy.It achieves about 79% reduction in parameters for VGG (on CIFAR-10) without significant loss in the original accuracy.We have successfully deployed working model on android device.

Leave a Reply

Your email address will not be published. Required fields are marked *