How Ada-RefSR Eliminates Hallucinations in Single‑Step Diffusion Super‑Resolution
This article presents Ada-RefSR, a novel single‑step diffusion‑based reference super‑resolution framework that introduces a "Trust but Verify" paradigm, adaptive implicit correlation gating, and lightweight architecture to robustly suppress hallucinations and achieve state‑of‑the‑art performance on multiple benchmarks, while being suitable for mobile deployment.
Background
Single‑image super‑resolution (SISR) based on diffusion models can generate high‑frequency details but often suffers from hallucinations—fabricated textures—especially when the low‑quality (LQ) input undergoes severe, unknown degradations. Reference‑based super‑resolution (RefSR) mitigates this by introducing a high‑quality reference image, yet matching LQ inputs to references becomes extremely difficult in real‑world scenarios.
Problem
Explicit token‑wise matching methods (e.g., ReFIR) are fragile under strong degradation, leading to erroneous texture transfer and visual artifacts. The key challenge is to exploit reference information adaptively: enhance it when similarity is high and discard it when unreliable.
Proposed Method: Ada‑RefSR (Trust but Verify)
Ada‑RefSR is built on a single‑step diffusion model and introduces two complementary pathways.
ReferenceNet Path : Uses frozen SD‑Turbo weights to preserve high‑quality feature extraction. A Reference Attention (RA) module aligns multi‑scale features of the LQ image and the reference.
AICG Branch (Adaptive Implicit Correlation Gating) : Computes a trust score between LQ and reference features and dynamically regulates the amount of detail injection.
Technical Logic
Step 1 – Feature Summarization : Instead of processing all reference tokens, a small set of learnable summary tokens TS ( M tokens) compresses essential high‑frequency information via cross‑attention. This drastically reduces computational load while retaining the most informative reference patterns.
Step 2 – Implicit Correlation : Queries derived from the LQ image are matched against the summarized reference tokens, producing a correlation map that reflects the reliability of each spatial region.
Step 3 – Adaptive Gating : The correlation map is averaged over the token dimension and passed through a sigmoid function to obtain adaptive weights G in the range [0, 1]. When G approaches 0, the model falls back to pure single‑image super‑resolution, preventing hallucination.
Advantages
Artifact‑Free Protection : Low gating values automatically disable reference usage in unreliable regions, eliminating mismatched artifacts.
Lightweight Design : The number of summary tokens M is far smaller than the original feature length; the AICG module adds negligible overhead.
End‑to‑End Self‑Learning : Gating weights are learned without manual labels, optimized solely by reconstruction quality.
Performance Evaluation
Ada‑RefSR was benchmarked on four mainstream datasets: CUFED5, WRSR (general texture), Face (portrait), and Bird (category‑specific). It consistently outperformed prior state‑of‑the‑art methods such as S3Diff, ReFIR, FaceMe, and InstantRestore.
| Dataset | Metric | Performance |
|------------------------|----------------------|-----------------------------------------------------------------------------|
| CUFED5 / WRSR (texture) | FID / LPIPS | Best among all methods; visual naturalness significantly higher than ReFIR |
| Face (portrait) | PSNR / SSIM | Surpasses FaceMe, InstantRestore and other domain‑specific baselines |
| Bird (category) | Structural stability | Superior semantic consistency and structural preservation |Key Conclusions
Achieves leading scores on perceptual metrics (FID, LPIPS), confirming high visual fidelity.
The AICG mechanism effectively suppresses hallucinations by disabling reference injection when the trust score is low, offering robustness beyond explicit matching approaches.
Mobile Deployment Benefits
Ultra‑Fast Inference : Single‑step generation accelerates inference by tens of times compared to multi‑step diffusion, enabling real‑time processing on smartphones.
Computational Efficiency : Minimal additional parameters and compatibility with bf16 quantization make the model memory‑friendly.
Robustness in Mobile Scenarios : Adaptive gating prevents quality degradation when the reference image is unrelated, guaranteeing a reliable lower bound on output quality.
Project page: https://github.com/vivoCameraResearch/AdaRefSR
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
