How to Implement SRCNN for Image Super‑Resolution in PyTorch
This article walks through a complete PyTorch implementation of the SRCNN model for image super‑resolution, covering dataset preparation, patch extraction, model architecture, training on a GTX 770 GPU for 2500 epochs, PSNR evaluation, and visual comparisons with bicubic up‑sampling.
Overview
The guide demonstrates a full PyTorch implementation of the SRCNN (Super‑Resolution Convolutional Neural Network) model for image super‑resolution tasks, including data preparation, model definition, training, validation, and result visualization.
Model Architecture and Differences
Compared with the original paper, this implementation adds padding so that the output image has the same spatial dimensions as the input, simplifying comparison. The original Caffe model has 8,032 parameters, whereas the PyTorch version contains over 20,000 parameters due to the added padding.
Optimizer Choice
The original SRCNN uses layer‑wise learning rates with SGD. For simplicity, the PyTorch version employs a single‑rate Adam optimizer for the entire network.
Dataset Preparation
Three datasets are used: T91 (training), Set5 and Set14 (validation and final testing). Image patches of size 32×32 are extracted from T91 with a stride of 14, yielding 22,227 patches. The patchify library creates the patches, and OpenCV saves both high‑resolution and low‑resolution (down‑sampled by 0.5 and up‑scaled with bicubic) versions.
Patch Extraction Code
from PIL import Image
from tqdm import tqdm
import matplotlib.pyplot as plt
import patchify
import numpy as np
import glob, os, cv2
SHOW_PATCHES = False
STRIDE = 14
SIZE = 32
def create_patches(input_paths, out_hr_path, out_lr_path):
os.makedirs(out_hr_path, exist_ok=True)
os.makedirs(out_lr_path, exist_ok=True)
all_paths = []
for input_path in input_paths:
all_paths.extend(glob.glob(f"{input_path}/*"))
print(f"Creating patches for {len(all_paths)} images")
for image_path in tqdm(all_paths, total=len(all_paths)):
image = Image.open(image_path)
image_name = os.path.splitext(os.path.basename(image_path))[0]
w, h = image.size
patches = patchify.patchify(np.array(image), (32, 32, 3), STRIDE)
for i in range(patches.shape[0]):
for j in range(patches.shape[1]):
patch = patches[i, j, 0]
patch = cv2.cvtColor(patch, cv2.COLOR_RGB2BGR)
cv2.imwrite(f"{out_hr_path}/{image_name}_{i}_{j}.png", patch)
low_res = cv2.resize(patch, (int(w*0.5), int(h*0.5)), interpolation=cv2.INTER_CUBIC)
high_res_up = cv2.resize(low_res, (w, h), interpolation=cv2.INTER_CUBIC)
cv2.imwrite(f"{out_lr_path}/{image_name}_{i}_{j}.png", high_res_up)Training Pipeline
Training runs on a GTX 770 GPU for roughly three days (≈2500 epochs, batch size 128). The training script logs loss and PSNR for both training and validation sets, saving model checkpoints every 100 epochs and the model state after each epoch. PSNR is computed with the following function:
import math, numpy as np, torch
from torchvision.utils import save_image
def psnr(label, outputs, max_val=1.0):
label = label.cpu().detach().numpy()
outputs = outputs.cpu().detach().numpy()
diff = outputs - label
rmse = math.sqrt(np.mean(diff ** 2))
if rmse == 0:
return 100
return 20 * math.log10(max_val / rmse)After 2500 epochs the final training PSNR reaches 29.85 dB and validation PSNR 29.61 dB. Although the validation set is a combined Set5 + Set14 collection, the values are slightly lower than those reported in the original paper.
Result Visualization
Loss and PSNR curves are saved as PNG files. Sample reconstructed images from the final epoch are compared against bicubic up‑sampling and the ground‑truth high‑resolution images. Across various scenes (comic, butterfly wing, zebra), SRCNN consistently produces sharper details than bicubic, though improvement varies with image content.
Code Organization
The project follows a clear directory layout:
├── input
│ ├── Set14
│ ├── Set5
│ ├── T91
│ ├── t91_hr_patches
│ ├── t91_lr_patches
│ ├── test_bicubic_rgb_2x
│ └── test_hr
├── outputs
│ ├── valid_results
│ ├── loss.png
│ ├── model_ckpt.pth
│ ├── model.pth
│ └── psnr.png
├── src
│ ├── bicubic.py
│ ├── datasets.py
│ ├── patchify_image.py
│ ├── srcnn.py
│ ├── train.py
│ └── utils.py
└── NOTES.mdKey scripts: utils.py – PSNR calculation, plot saving, model checkpoint utilities. patchify_image.py – Generates high‑ and low‑resolution patches. bicubic.py – Prepares validation images by down‑sampling with bicubic interpolation. datasets.py – Defines SRCNNDataset and data loader helpers. srcnn.py – Implements the three‑layer convolutional network. train.py – Orchestrates training, validation, logging, and checkpointing.
Conclusion
The article provides a reproducible end‑to‑end pipeline for training SRCNN on standard super‑resolution benchmarks, demonstrates practical choices (padding, Adam optimizer), and presents quantitative (PSNR) and qualitative (visual) evidence that the learned model outperforms simple bicubic up‑sampling.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code DAO
We deliver AI algorithm tutorials and the latest news, curated by a team of researchers from Peking University, Shanghai Jiao Tong University, Central South University, and leading AI companies such as Huawei, Kuaishou, and SenseTime. Join us in the AI alchemy—making life better!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
