How MambaIRv2 Boosts Image Restoration with Attentive State‑Space Design
Introducing MambaIRv2, an image restoration backbone that replaces Mamba’s causal scanning with an attentive state‑space module, achieving single‑direction scanning, reduced parameters and computation, and superior performance on lightweight and classic super‑resolution, JPEG artifact removal, and denoising tasks, as validated by CVPR‑2025 results.
Introduction
We previously introduced MambaIR, a Mamba‑based backbone for image restoration. MambaIRv2 extends this work and has been accepted at CVPR 2025. The paper and code are available at https://arxiv.org/pdf/2411.15269 and https://github.com/csguoh/MambaIR.
Motivation
MambaIRv2 addresses the causal‑scanning limitation of the original Mamba model. In a causal scan, a pixel at position i can only attend to earlier pixels when the image is flattened to a 1‑D sequence, which prevents non‑causal visual tasks that require simultaneous access to all tokens. We therefore propose an attention‑like scanning that allows each token to see the entire image.
Some Findings
1. Multi‑direction information redundancy
Previous visual Mamba methods used four directional scans to overcome causality, incurring high computational cost. Feature‑similarity visualizations show that the four directions are highly redundant, suggesting that a single‑direction scan is sufficient.
2. Long‑range interaction decay
When scanning in a single direction, the influence of distant pixels diminishes, causing performance loss on tasks that require global context.
3. Connecting SSM and attention
Mathematically, the state‑space model (SSM) can be reformulated to resemble attention. By rewriting linear attention in a specific form and similarly transforming the SSM equations, a direct correspondence between the two mechanisms is revealed.
Method
The core of MambaIRv2 is the Attentive State Space Module (ASSM) , which consists of:
Attentive State‑space Equation (ASE) : introduces prompt‑learning to expand the receptive field of each pixel, enabling global query capability similar to Vision Transformers.
Semantic Guided Neighboring (SGN) : rearranges semantically similar pixels to be closer in the sequence, mitigating long‑range decay.
Using a single‑direction scan eliminates the heavy cost of multi‑direction processing while preserving or improving performance.
Experiments
Lightweight Super‑Resolution
Classic Super‑Resolution
JPEG Artifact Removal
Image Denoising
Further Discussion
Comparison with MambaIR‑V1
Compared with the original four‑scan MambaIR, MambaIRv2 reduces parameters by 43 % and FLOPs by 50 % on the 2×Urban100 benchmark, while delivering a 0.34 dB PSNR gain.
Attentive Map Visualization
The prompts in the attentive state‑space equation enable a query pixel to attend to semantically related regions across the whole image, achieving global information aggregation.
Conclusion
MambaIRv2 overcomes the causal modeling limitation of the original Mamba backbone by integrating an Attentive State‑space Equation and Semantic Guided Neighboring. These innovations allow single‑direction scanning, reduce computational overhead, and achieve state‑of‑the‑art performance across multiple image restoration tasks.
<img src="https://mmbiz.qpic.cn/sz_mmbiz_gif/AIR6eRePgjMeCPy4uvA6rdxxMZAJjGhyQRw8ianW4AGgicuZ1ZltoUiatmWlsv80cc9IAVeaIYbSfIke3FzNHU3QA/640?wx_fmt=gif" alt="Visualization GIF"/>Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
