HSR-Mamba Solves Mamba’s HSISR Issue with Dual Strategies, Beats Prior Methods
HSR-Mamba introduces a contextual spatial‑spectral state‑space model that tackles Mamba's limitations in hyperspectral image super‑resolution through a local partition mechanism and a global spectral rearrangement strategy, achieving significantly higher PSNR, SSIM and SAM scores than existing approaches while using fewer parameters and FLOPs.
Abstract
Mamba, with its strong global modeling ability and linear computational complexity, performs well on visual tasks and shows great potential for hyperspectral image super‑resolution (HSISR). However, when converting hyperspectral images to 1‑D sequences, Mamba ignores the spatio‑spectral relationships between adjacent pixels and is highly sensitive to input order, which harms the recovery of spatial and spectral details. To address these issues, the authors propose HSR‑Mamba, a contextual spatial‑spectral state‑space model for HSISR that resolves both local and global problems. Specifically, a local spatio‑spectral partition mechanism establishes block‑wise causal relations among neighboring pixels in 3‑D features, mitigating local forgetting. In addition, a global spectral rearrangement strategy based on spectral similarity enhances causal connections among similar pixels across spatial and spectral dimensions. Extensive experiments demonstrate that HSR‑Mamba outperforms existing methods in both quantitative quality and visual fidelity.
1. Introduction
Hyperspectral images (HSIs) contain dozens to hundreds of tightly coupled spectral bands, providing rich spectral and spatial information that benefits applications in agriculture, medical diagnosis, and remote sensing. HSISR aims to reconstruct high‑resolution HSIs from low‑resolution inputs, but hardware and algorithmic constraints often force a trade‑off between spatial and spectral resolution.
Recent advances such as the Mamba state‑space model (SSM) offer linear‑time long‑range dependency modeling, yet directly applying Mamba to HSISR suffers from two major drawbacks: (1) loss of local spatio‑spectral structure when flattening images to 1‑D sequences, and (2) strong dependence on input order, which prevents effective modeling of highly similar pixels.
2. Related Work
2.1 Single‑hyperspectral super‑resolution (SHSR)
SHSR methods are divided into fusion‑based approaches that rely on auxiliary multispectral or panchromatic images, and single‑image approaches that operate solely on the hyperspectral data. While fusion methods can achieve superior results, acquiring aligned auxiliary data is often impractical, leading to growing interest in SHSR. Deep‑learning‑based SHSR has shown significant advantages over traditional priors by learning complex non‑linear mappings.
Transformer‑based networks capture long‑range dependencies across spatial and spectral dimensions but incur quadratic computational cost, limiting scalability for high‑dimensional HSIs.
2.2 State‑space models
SSMs provide an efficient mathematical framework for modeling temporal or sequential dependencies. The Mamba model, a recent SSM variant, achieves linear‑time long‑range modeling and has outperformed Transformers in natural language processing and computer vision tasks. However, existing Mamba‑based visual models do not consider the rich spectral information and spatio‑spectral correlations inherent in HSIs, and they suffer from local pixel forgetting and order sensitivity.
3. Method
3.1 Overview of HSR‑Mamba
HSR‑Mamba consists of three main components: a shallow feature extraction module, a deep feature extraction module built from multiple Contextual Spatial‑Spectral Mamba Groups (CSMGs), and an up‑sampling module. The input low‑resolution HSI \(X_{LR}\) with dimensions \(H \times W \times C\) is first processed by shallow convolutions to obtain feature \(F_{shallow}\). The deep module stacks CSMGs, each containing a Local Spatial‑Spectral Mamba (LSSM) block and a Global Spectral Correlation Mamba (GSCM) block, to capture long‑range dependencies.
3.2 Local Spatial‑Spectral Partition (LSSP) and BSSM
To overcome Mamba’s local forgetting, the authors design a local scanning mechanism that partitions the 3‑D feature map into \(N\) blocks of size \(b_h \times b_w \times b_c\). Within each block, a Bidirectional State‑Space Model (BSSM) captures long‑range spatio‑spectral dependencies, establishing causal relations among neighboring pixels.
3.3 Global Spectral Rearrangement (GSRM)
The GSRM computes a spectral correlation matrix, averages correlations per band to obtain global similarity scores, and then reorders the spectral dimension so that highly correlated bands are placed adjacently. This rearrangement strengthens causal modeling of similar pixels across the entire image.
3.4 Loss Functions
The network is optimized with a combination of three losses: L1 reconstruction loss, Spectral Angle Mapper (SAM) loss, and gradient loss in both spatial and spectral domains. The total loss is \(\mathcal{L}=\lambda_1\mathcal{L}_{L1}+\lambda_2\mathcal{L}_{SAM}+\lambda_3\mathcal{L}_{grad}\), where \(\lambda_1,\lambda_2,\lambda_3\) are balanced empirically.
4. Experiments
4.1 Datasets and Implementation Details
Experiments are conducted on three HSI datasets: Chikusei, Houston2018, and Pavia Center. The authors use 4 non‑overlapping patches from the top region of each dataset for testing and the remaining patches for training/validation. Low‑resolution patches are generated by bicubic down‑sampling with scale factors 2, 4, and 8. The model uses 64 channels, 4 CSMGs, and 2 CSSMs per group. Initial learning rate is set to 1e‑4 and halved every 100 epochs up to 400 epochs. Adam optimizer with Xavier initialization and batch size 8 is employed. PixelShuffle is used for progressive up‑sampling. Training is performed on an NVIDIA RTX 4090 GPU using PyTorch.
4.2 Comparison with State‑of‑the‑Art Methods
The proposed HSR‑Mamba is compared against eight deep‑learning baselines, including SwinIR, MambaIR, GDRRN, SSPSR, RFSR, GELIN, AS3ITransUNet, and MSDformer. Evaluation metrics cover PSNR, SSIM, SAM, CC, RMSE, and ERGAS. Across all scale factors, HSR‑Mamba achieves the best scores, e.g., a 0.29 dB PSNR gain over SSPSR at scale ×4 on the Chikusei dataset, and consistently higher spectral fidelity.
4.3 Ablation Study
Removing the LSSP module reduces PSNR by 0.09 dB, confirming its effectiveness in alleviating local forgetting. Excluding GSRM leads to a 0.12 dB PSNR drop, demonstrating the importance of global spectral rearrangement. When both components are omitted, performance degrades substantially.
4.4 Parameter and Complexity Analysis
HSR‑Mamba attains superior SR results while using fewer parameters and lower FLOPs compared with the baselines, indicating an excellent trade‑off between model complexity and performance.
5. Conclusion
HSR‑Mamba introduces a contextual spatial‑spectral state‑space architecture that effectively captures long‑range dependencies in hyperspectral images. The dual mechanisms—local spatial‑spectral partition and global spectral rearrangement—address Mamba’s inherent limitations, leading to state‑of‑the‑art HSISR performance with modest computational cost.
AIWalker
Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
