How Dual‑Domain Strip Attention Revolutionizes Image Restoration
The paper introduces Dual‑Domain Strip Attention Network (DSANet), a lightweight architecture that combines spatial and frequency strip attention to boost multi‑scale representation learning, achieving state‑of‑the‑art performance on dehazing, desnowing, defocus deblurring, and denoising tasks with significantly lower computational cost.
Paper Information
Title: Dual‑Domain Strip Attention for Image Restoration
Authors: Technical University of Munich, Department of Computer Science, Information and Technology
DOI: https://doi.org/10.1016/j.neunet.2023.12.003
Background and Motivation
Image restoration aims to recover high‑quality images from degraded observations, a prerequisite for applications such as surveillance, remote sensing, and medical imaging. Conventional hand‑crafted priors struggle with the ill‑posed nature of the problem, while convolutional neural networks (CNNs) face difficulty handling large‑scale blur and spatially varying degradations. Transformer‑based models provide strong global modeling but incur quadratic self‑attention complexity O((HW)^2·C), which is prohibitive for high‑resolution restoration tasks.
Contributions
Introduce the Dual‑Domain Strip Attention Network (DSANet) that employs a novel Dual‑Domain Strip Attention Module (DSAM) to enhance multi‑scale representation learning with low computational cost.
DSAM consists of a Spatial Strip Attention (SSA) unit operating in the spatial domain and a Frequency Strip Attention (FSA) unit operating in the frequency domain.
Show that DSANet achieves state‑of‑the‑art performance on several image‑restoration benchmarks (dehazing, desnowing, defocus deblurring, denoising) while keeping FLOPs comparable to lightweight CNN baselines.
Method Details
Spatial Strip Attention (SSA)
Given an input feature tensor X \in \mathbb{R}^{C \times H \times W}, SSA replaces the conventional query‑key‑value generation with a lightweight branch:
# Global average pooling over spatial dimensions
G = GAP(X) # shape: (C,)A 1×1 convolution W_{1\times1} followed by a sigmoid produces strip‑wise attention weights A \in \mathbb{R}^{K}, where K is the length of a horizontal (or vertical) strip: A = sigmoid(W_{1\times1}(G)) # shape: (K,) The weights modulate a convolutional aggregation that gathers context from neighboring pixels along the strip. The computational complexity becomes O(H·W·C·K), far lower than the quadratic cost of full self‑attention.
Frequency Strip Attention (FSA)
FSA first applies strip‑wise pooling to separate the feature map into frequency components along the horizontal and vertical axes. For each component a lightweight attention weight \alpha is learned (similar to SSA) and used to modulate the spectral representation:
# Strip‑wise pooling to obtain frequency slices
F_h, F_v = strip_pool(X, axis='horizontal'), strip_pool(X, axis='vertical')
# Learn attention for each slice
alpha_h = sigmoid(W_h(F_h))
alpha_v = sigmoid(W_v(F_v))
# Modulate frequency components
F'_h = alpha_h * F_h
F'_v = alpha_v * F_vThe modulated components are recombined via inverse transform, adding minimal overhead while enriching the frequency‑domain information.
Network Architecture
DSANet adopts an encoder‑decoder backbone with three scales. Each scale contains three Residual Groups (ResGroups). The model size is controlled by a scaling factor N (e.g., N=1 for the base model, N=2 for a larger variant). Training uses 256×256 patches, batch size 8, and Adam optimizer with cosine annealing.
Experimental Evaluation
All experiments report Peak Signal‑to‑Noise Ratio (PSNR) and Structural Similarity Index (SSIM) as metrics.
Image Dehazing
On the SOTS‑Indoor and SOTS‑Outdoor benchmarks, DSANet improves PSNR by 0.96 dB and 0.54 dB over the previous best method SANet. It also achieves the highest scores on four real‑world hazy datasets (NH‑HAZE, NH‑HAZE2, O‑Haze, Dense‑Haze).
Image Desnowing
DSANet attains the best PSNR on CSD, SRRS, and Snow100K, surpassing FocalNet by 0.91 dB on the CSD dataset.
Defocus Deblurring
On the DPDD dataset, DSANet outperforms competing approaches on most evaluation metrics, demonstrating strong handling of spatially varying blur.
Image Denoising
Across BSD68 with Gaussian noise levels σ=15, 25, 50, DSANet consistently exceeds Restormer in both PSNR and SSIM.
Conclusion
The Dual‑Domain Strip Attention mechanism efficiently captures multi‑scale context in both spatial and frequency domains. By integrating SSA and FSA, DSANet delivers superior restoration quality with low computational burden and generalizes well to real‑world degraded images, making it a practical solution for diverse image‑enhancement tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
