Understanding Transpose Convolution (Deconvolution) in Convolutional Neural Networks
The article explains how transpose (de)convolution works as the spatial inverse of standard convolution, detailing its relationship to fully‑connected layers, padding, stride, output size formulas, odd‑case handling, and practical implementation in frameworks like PyTorch and TensorFlow.
In recent convolutional neural network (CNN) designs, the transpose convolution layer—also known as deconvolution or fractionally‑strided convolution—appears frequently, especially in the generator of Generative Adversarial Networks (GANs) for up‑sampling. This article explains the relationship and differences between transpose convolution and standard convolution, and details the implementation process.
1. Convolutional Layer vs. Fully‑Connected Layer
Traditional feed‑forward neural networks use fully‑connected layers, where every neuron in one layer connects to every neuron in the next layer via a dense weight matrix. Convolutional layers, by contrast, use a sparse weight matrix (the kernel) that connects only a local region (e.g., a 3×3 patch) of the input to each output neuron. A convolutional layer can be viewed as a special case of a fully‑connected layer with many zero weights, dramatically reducing the number of parameters and enabling the network to learn local, translation‑invariant features.
2. Convolution Operation
2.1 Basic Convolution (no padding, stride = 1)
A single 3×3 kernel slides over the input image, performing element‑wise multiplication and summation to produce a 2×2 output feature map.
2.2 Convolution with Padding
Padding adds zeros around the input so that the output size can be kept equal to the input size ("same" padding). Other padding modes include "full" (padding = kernel‑size − 1) and "valid" (no padding).
same padding: output size = input size (e.g., 3×3 kernel → padding = 1)
full padding: padding = kernel‑size − 1
valid padding: padding = 0
2.3 Convolution with Stride > 1
Stride defines the step size of the kernel. A stride of 2 reduces the spatial resolution (down‑sampling). The output size is computed as:
W₂ = ⌊(W₁ − F + 2P) / S⌋ + 1
2.4 Relationship Between Input/Output Size, Kernel, Padding, and Stride
The general formula for the output width (or height) of a standard convolution is:
W₂ = ((W₁ − F + 2P) / S) + 1
If the division is not exact, the result is floored, leading to the so‑called “odd” convolution case.
3. Transpose Convolution (Deconvolution)
Transpose convolution performs the inverse spatial operation of a standard convolution: it up‑samples the feature map. It is often used in GAN generators and other decoder architectures.
3.1 No‑Padding, No‑Stride Case
The transpose of a simple convolution (no padding, stride = 1) is illustrated below.
3.2 Transpose of Padding Convolution
When the forward convolution uses padding, the corresponding transpose convolution may have a different padding configuration. The animation below shows the transpose of the “same padding = 1” case.
3.3 Transpose of Stride > 1 Convolution
To simulate a stride < 1 in the transpose operation, zeros are inserted between input elements (stride − 1 zeros). This effectively halves the step size.
3.4 Relationship Between Standard and Transpose Convolution Parameters
3.4.1 Transpose Padding
The padding used in the transpose operation (P_T) can be derived from the forward convolution parameters:
P_T = F − P − 1
where F is the kernel size and P is the forward padding.
3.4.2 Output Size of Transpose Convolution
The output width of a transpose convolution is:
W₁ = (W₂ − 1) × S − 2P + F
with S being the forward stride, P the forward padding, and F the kernel size.
3.4.3 Odd Convolution in the Transpose Setting
When the forward convolution involves an odd division, the transpose operation may need an additional output_padding to recover the missing pixels. In PyTorch this is exposed as the output_padding argument.
4. Summary
The article first reviews the connection between feed‑forward fully‑connected networks and CNNs, then details the mathematics of convolution (kernel size, padding, stride, and output size). Finally, it demystifies transpose convolution, providing concrete examples for each parameter setting and explaining how frameworks such as PyTorch and TensorFlow implement the operation.
5. References
Intuitive explanations of CNNs on Zhihu (translation‑invariance discussion).
"A Guide to Convolution Arithmetic for Deep Learning" – source of the animated illustrations.
Various online articles comparing convolution and transpose convolution.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.