Last Updated: April 23, 2022
Separable Depthwise Convolution
In this tutorial, you'd learn about what depthwise separable convolutions are and how they compare to regular convolution filters. You'd see that they are more efficient than regular convolutions in terms of speed and memory with little tradeoffs.
Lastly, you'd see how it can be combined into a standard neural architecture which hopefully you'd be able to adapt to your development workflows.
First lets write all the necessary imports needed in this tutorial as follows
import torch
from prettytable import PrettyTable
from collections import OrderedDict
Regular Convolutions
We'd begin by examining just how many parameters and FLOPs (Floating Point Operations) are in a regular convolution. If you don't know what these terms are, they'd be explained soon.
First let's define a regular convolution layer.
input_channels = 3
output_channels = 64
kernel_size = 5
stride = 2
regular_conv = torch.nn.Conv2d(input_channels, output_channels, kernel_size, stride)
Number of Parameters in a Regular Convolution
The number of parameters in a convolution layer (regardless of whether it's a regular or depthwise layer) is simply the number of elements in the layer that have to be "learnt" during the training process.
For a conv layer, this will typically be the total number of weights and biases, specifically the total number of kernels (or filters) in the layer.
In our current example, We have defined the following
- A kernel size as
- The expected number of input channels are , therefore each filter (kernel) would be a tensor of size
- The specified number of output channels are . Which implies the following
- There would be 64 kernels
- There would be a corresponding scalar value known as the bias for each kernel i.e bias size of
With this, we can determine the total number of parameters as the filter size times number of output channels plus bias. Which can be calculated as
Let's write a function total_learnables
to perform the calculation above for any pytorch module.
def total_learnables(model):
table = PrettyTable(["Learnable", "Count"])
total_params = 0
for name, parameter in model.named_parameters():
if not parameter.requires_grad: continue
params = parameter.numel()
table.add_row([name, params])
total_params += params
return (total_params, table)
(total_params_regular, table) = total_learnables(regular_conv)
print(table)
print(f"[Regular Convolution] Total Learnables = {total_params_regular}")
+-----------+-------+
| Learnable | Count |
+-----------+-------+
| weight | 4800 |
| bias | 64 |
+-----------+-------+
[Regular Convolution] Total Learnables = 4864
Number of Floating Point Operations (Flops) In a Regular Convolution
The number of FLOPs means the number of operations that would be performed by this layer. This is highly dependent on the input to the convolution layer.
To show this, let's define an input image for the convolution layer as
rand_image = torch.rand(1, input_channels, 228, 228) # Batch, Channel, Spatial, Spatial
With an input image of size , when it is convolved with the conv layer. The total operations that will be performed can be calculated as
There's no need to write a function to compute this - FLOPs are more of an algorithmic notation compared to the parameter sizes.
Now that you've seen how a regular convolution can be viewed from the perspective of its floating point operations and the total number of learnable parameters - It's time to see how separable depthwise convolutions are an improvement in terms of efficiency.
Separable Depthwise Convolutions
In a nutshell, depthwise separable convolutions are a factorised form of regular convolutions.
An analogy is representing a matrix using 2 smaller vectors and , both of size . \ By multiplying , we get the resulting matrix but with a smaller representation .
This can be defined as
separable_conv = torch.nn.Sequential(OrderedDict([
("Depthwise", torch.nn.Conv2d(input_channels, input_channels, kernel_size, stride, groups=input_channels)),
("Pointwise", torch.nn.Conv2d(input_channels, output_channels, 1, 1))
]))
(total_params_separable, table) = total_learnables(separable_conv)
print(table)
print(f"[Separable Convolution] Total Learnables = {total_params_separable}")
print(f"Percent reduction = {(1 - total_params_separable / total_params_regular) * 100}%")
+------------------+-------+
| Learnable | Count |
+------------------+-------+
| Depthwise.weight | 75 |
| Depthwise.bias | 3 |
| Pointwise.weight | 192 |
| Pointwise.bias | 64 |
+------------------+-------+
[Separable Convolution] Total Learnables = 334
Percent reduction = 93.13322368421053%
out = regular_conv(rand_image)
print(f"Standard Convolution: {out.size()}")
out = separable_conv(rand_image)
print(f"Separable Convolution: {out.size()}")
Standard Convolution: torch.Size([1, 64, 112, 112])
Separable Convolution: torch.Size([1, 64, 112, 112])