Why DropBlock Outperforms Dropout as an Image Regularizer
This article demonstrates how to implement DropBlock in PyTorch, explains why Dropout fails on image data, details the gamma calculation and mask generation, and shows visual comparisons that illustrate the superiority of contiguous region dropping over random pixel dropout.
Introduction
Ghiasi et al. introduced DropBlock as a regularization technique specifically designed for images, and empirical results show it works better than standard Dropout.
Problems with Dropout on Images
Dropout randomly zeroes individual input elements before passing them to the next layer. While easy to use via nn.Dropout() in PyTorch, it discards independent pixels, which is ineffective for 2‑D feature maps because nearby activations contain useful semantic information.
import torch
import matplotlib.pyplot as plt
from torch import nn
# keeping one channel for better visualisation
x = torch.ones((1, 1, 16, 16))
drop = nn.Dropout()
x_drop = drop(x)
to_plot = lambda x: x.squeeze(0).permute(1,2,0).numpy()
fig, axs = plt.subplots(1, 2)
axs[0].imshow(to_plot(x), cmap='gray')
axs[1].imshow(to_plot(x_drop), cmap='gray')The figure shows random pixels being dropped, which does not remove semantic information effectively.
Visualizing Dropout on Feature Maps
To see how Dropout affects feature maps, the author loads a Baby Yoda image, passes it through a pretrained ResNet‑18, extracts the third‑layer feature map, and applies Dropout followed by ReLU.
import requests
from glasses.models import AutoModel, AutoTransform
from PIL import Image
from io import BytesIO
# get an image of baby yoda
r = requests.get('https://upload.wikimedia.org/wikipedia/en/0/00/The_Child_aka_Baby_Yoda_%28Star_Wars%29.jpg')
img = Image.open(BytesIO(r.content))
x = AutoTransform.from_name('resnet18')(img) # transform to model input
model = AutoModel.from_pretrained('resnet18').eval()
with torch.no_grad():
model.encoder.features
model(x.unsqueeze(0))
features = model.encoder.features
f = features[2] # third layer output [1,128,28,28]
f_drop = nn.Sequential(nn.Dropout(), nn.ReLU())(f)
f_l = nn.ReLU()(f)[:,0,:,:]
f_drop_l = f_drop[:,0,:,:]
fig, axs = plt.subplots(1, 2)
axs[0].imshow(f_l.squeeze().numpy())
axs[1].imshow(f_drop_l.squeeze().numpy())The left panel shows the original activations; the right panel shows activations after Dropout. They remain very similar, indicating that zeroing isolated pixels does not significantly disrupt information flow.
DropBlock Mechanism
DropBlock addresses Dropout’s limitation by removing contiguous regions from the feature map, thereby better destroying semantic information. The core idea is illustrated in the following diagram.
The principle of DropBlock is shown below.
Implementation Details
First, define a DropBlock layer with the required parameters.
from torch import nn
import torch
from torch import Tensor
class DropBlock(nn.Module):
def __init__(self, block_size: int, p: float = 0.5):
super().__init__()
self.block_size = block_size
self.p = p
def calculate_gamma(self, x: Tensor) -> float:
"""Compute gamma, eq (1) in the paper"""
invalid = (1 - self.p) / (self.block_size ** 2)
valid = (x.shape[-1] ** 2) / ((x.shape[-1] - self.block_size + 1) ** 2)
return invalid * validHere block_size is the side length of the region to drop, and p plays the same role as the keep probability in Dropout.
The next step is to sample a mask of the same size as the input from a Bernoulli distribution with the computed gamma.
gamma = self.calculate_gamma(x)
mask = torch.bernoulli(torch.ones_like(x) * gamma)To turn the binary mask into contiguous blocks, a max‑pooling operation with kernel size equal to block_size and stride 1 is applied. The pooled result is inverted to obtain the final block mask.
mask_block = 1 - F.max_pool2d(
mask,
kernel_size=(self.block_size, self.block_size),
stride=(1, 1),
padding=(self.block_size // 2, self.block_size // 2),
)The regularized output is then computed by scaling the input with the mask and a normalization factor.
x = mask_block * x * (mask_block.numel() / mask_block.sum())The complete forward method integrates these steps.
import torch.nn.functional as F
class DropBlock(nn.Module):
def __init__(self, block_size: int, p: float = 0.5):
super().__init__()
self.block_size = block_size
self.p = p
def calculate_gamma(self, x: Tensor) -> float:
invalid = (1 - self.p) / (self.block_size ** 2)
valid = (x.shape[-1] ** 2) / ((x.shape[-1] - self.block_size + 1) ** 2)
return invalid * valid
def forward(self, x: Tensor) -> Tensor:
if self.training:
gamma = self.calculate_gamma(x)
mask = torch.bernoulli(torch.ones_like(x) * gamma)
mask_block = 1 - F.max_pool2d(
mask,
kernel_size=(self.block_size, self.block_size),
stride=(1, 1),
padding=(self.block_size // 2, self.block_size // 2),
)
x = mask_block * x * (mask_block.numel() / mask_block.sum())
return xTesting DropBlock on Baby Yoda
import torchvision.transforms as T
r = requests.get('https://upload.wikimedia.org/wikipedia/en/0/00/The_Child_aka_Baby_Yoda_%28Star_Wars%29.jpg')
img = Image.open(BytesIO(r.content))
tr = T.Compose([T.Resize((224, 224)), T.ToTensor()])
x = tr(img)
drop_block = DropBlock(block_size=19, p=0.8)
x_drop = drop_block(x)
fig, axs = plt.subplots(1, 2)
axs[0].imshow(to_plot(x))
axs[1].imshow(x_drop[0,:,:].squeeze().numpy())The result shows contiguous regions being zeroed, confirming that DropBlock removes blocks rather than isolated neurons.
Additional Observations
When block_size = 1, DropBlock behaves exactly like Dropout. When block_size equals the full feature‑map size, it becomes equivalent to Dropout2d (also known as SpatialDropout).
Conclusion
We now know how to implement DropBlock in PyTorch. The original paper reports a series of experiments on a vanilla ResNet‑50, progressively adding regularization methods; the results are summarized in the table below.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code DAO
We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
