How HINT’s Hierarchical Multi‑Head Attention Boosts Image Restoration Quality
The paper introduces HINT, a Transformer‑based image restoration model that employs Hierarchical Multi‑Head Attention (HMHA) and a Query‑Key Cache Updating (QKCU) module to eliminate attention redundancy, achieving superior PSNR/SSIM scores across low‑light enhancement, dehazing, desnowing, denoising, and deraining tasks while maintaining low model complexity.
Background
Standard Multi‑Head Attention (MHA) in vision Transformers allocates identical sub‑spaces to each head, causing redundancy because multiple heads attend to the same image regions. This limits restoration quality in low‑level vision tasks.
Key Contributions
Hierarchical Multi‑Head Attention (HMHA) : The channel dimension is split into a hierarchy of sub‑spaces C₁ ≤ C₂ ≤ … ≤ C_h. Before attention, channels are reordered so that each head operates on a distinct sub‑space, encouraging diverse semantic feature learning.
Query‑Key Cache Updating (QKCU) : Introduces intra‑layer and inter‑layer cache mechanisms with gating. Cached queries and keys are exchanged among heads, reducing redundancy and increasing feature diversity.
Model Architecture (HINT)
HINT follows an encoder‑decoder layout with four hierarchical levels and a bottleneck at the fourth level. Each level contains basic transformer blocks; a refinement stage adds four additional blocks. The overall pipeline is illustrated below.
Experimental Setup
Datasets : Twelve benchmark datasets covering five restoration tasks—low‑light enhancement, dehazing, desnowing, denoising, and deraining.
Metrics : PSNR and SSIM for paired data; MANIQA for unpaired real‑world inputs.
Training : AdamW optimizer, standard image‑restoration loss functions, and momentum α = 0.9.
Results and Analysis
Low‑Light Enhancement : On the LOL‑v2 benchmark, HINT improves average PSNR by 0.9 dB over Retinexformer and outperforms all baselines by at least 1.74 dB.
Desnowing : On Snow100K, HINT achieves the highest PSNR/SSIM, surpassing the recent AST method by 1.64 dB PSNR.
Dehazing : On the SOTS benchmark, HINT leads all competitors, gaining a minimum of 0.35 dB PSNR.
Model Efficiency : Despite superior performance, HINT has fewer parameters than CNN‑based MIRNet and Transformer‑based IPT/Restormer, delivering the best PSNR‑per‑complexity ratio.
Real‑World Evaluation
When tested on unpaired real‑world images, HINT consistently produces the most visually pleasing restorations, avoiding the color distortion and artifacts observed in competing methods.
Conclusion
Integrating hierarchical attention with cache‑based query‑key updates effectively mitigates MHA redundancy, achieving state‑of‑the‑art performance across diverse image restoration tasks while keeping model size modest. Future work will explore extreme low‑light scenarios using larger real‑world training collections.
Resources
Code and pretrained models: https://github.com/AIFengheshu/Plug-play-modules
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
