How Dual‑Domain Strip Attention Revolutionizes Image Restoration

The paper introduces Dual‑Domain Strip Attention Network (DSANet), a lightweight architecture that combines spatial and frequency strip attention to boost multi‑scale representation learning, achieving state‑of‑the‑art performance on dehazing, desnowing, defocus deblurring, and denoising tasks with significantly lower computational cost.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
How Dual‑Domain Strip Attention Revolutionizes Image Restoration

Paper Information

Title: Dual‑Domain Strip Attention for Image Restoration

Authors: Technical University of Munich, Department of Computer Science, Information and Technology

DOI: https://doi.org/10.1016/j.neunet.2023.12.003

Background and Motivation

Image restoration aims to recover high‑quality images from degraded observations, a prerequisite for applications such as surveillance, remote sensing, and medical imaging. Conventional hand‑crafted priors struggle with the ill‑posed nature of the problem, while convolutional neural networks (CNNs) face difficulty handling large‑scale blur and spatially varying degradations. Transformer‑based models provide strong global modeling but incur quadratic self‑attention complexity O((HW)^2·C), which is prohibitive for high‑resolution restoration tasks.

Contributions

Introduce the Dual‑Domain Strip Attention Network (DSANet) that employs a novel Dual‑Domain Strip Attention Module (DSAM) to enhance multi‑scale representation learning with low computational cost.

DSAM consists of a Spatial Strip Attention (SSA) unit operating in the spatial domain and a Frequency Strip Attention (FSA) unit operating in the frequency domain.

Show that DSANet achieves state‑of‑the‑art performance on several image‑restoration benchmarks (dehazing, desnowing, defocus deblurring, denoising) while keeping FLOPs comparable to lightweight CNN baselines.

Method Details

Spatial Strip Attention (SSA)

Given an input feature tensor X \in \mathbb{R}^{C \times H \times W}, SSA replaces the conventional query‑key‑value generation with a lightweight branch:

# Global average pooling over spatial dimensions
G = GAP(X)               # shape: (C,)

A 1×1 convolution W_{1\times1} followed by a sigmoid produces strip‑wise attention weights A \in \mathbb{R}^{K}, where K is the length of a horizontal (or vertical) strip: A = sigmoid(W_{1\times1}(G)) # shape: (K,) The weights modulate a convolutional aggregation that gathers context from neighboring pixels along the strip. The computational complexity becomes O(H·W·C·K), far lower than the quadratic cost of full self‑attention.

Frequency Strip Attention (FSA)

FSA first applies strip‑wise pooling to separate the feature map into frequency components along the horizontal and vertical axes. For each component a lightweight attention weight \alpha is learned (similar to SSA) and used to modulate the spectral representation:

# Strip‑wise pooling to obtain frequency slices
F_h, F_v = strip_pool(X, axis='horizontal'), strip_pool(X, axis='vertical')
# Learn attention for each slice
alpha_h = sigmoid(W_h(F_h))
alpha_v = sigmoid(W_v(F_v))
# Modulate frequency components
F'_h = alpha_h * F_h
F'_v = alpha_v * F_v

The modulated components are recombined via inverse transform, adding minimal overhead while enriching the frequency‑domain information.

Network Architecture

DSANet adopts an encoder‑decoder backbone with three scales. Each scale contains three Residual Groups (ResGroups). The model size is controlled by a scaling factor N (e.g., N=1 for the base model, N=2 for a larger variant). Training uses 256×256 patches, batch size 8, and Adam optimizer with cosine annealing.

Experimental Evaluation

All experiments report Peak Signal‑to‑Noise Ratio (PSNR) and Structural Similarity Index (SSIM) as metrics.

Image Dehazing

On the SOTS‑Indoor and SOTS‑Outdoor benchmarks, DSANet improves PSNR by 0.96 dB and 0.54 dB over the previous best method SANet. It also achieves the highest scores on four real‑world hazy datasets (NH‑HAZE, NH‑HAZE2, O‑Haze, Dense‑Haze).

Image Desnowing

DSANet attains the best PSNR on CSD, SRRS, and Snow100K, surpassing FocalNet by 0.91 dB on the CSD dataset.

Defocus Deblurring

On the DPDD dataset, DSANet outperforms competing approaches on most evaluation metrics, demonstrating strong handling of spatially varying blur.

Image Denoising

Across BSD68 with Gaussian noise levels σ=15, 25, 50, DSANet consistently exceeds Restormer in both PSNR and SSIM.

Conclusion

The Dual‑Domain Strip Attention mechanism efficiently captures multi‑scale context in both spatial and frequency domains. By integrating SSA and FSA, DSANet delivers superior restoration quality with low computational burden and generalizes well to real‑world degraded images, making it a practical solution for diverse image‑enhancement tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Deep LearningNeural Networksdual-domain attentionstrip attention
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.