CVPR NTIRE 2026 UGC Short Video Restoration Challenge: Winning Solutions Revealed
The CVPR NTIRE 2026 challenge introduced the KwaiVIR benchmark for real‑world UGC short‑video degradation and hosted two tracks—subjective and objective—where 12 teams submitted results, with RedMediaTech achieving top scores across fidelity, perceptual quality, and temporal consistency, followed by detailed analyses of the leading methods.
Challenge Overview
The NTIRE 2026 challenge focused on restoring user‑generated short videos (UGC) under realistic degradation conditions. It introduced the KwaiVIR benchmark, comprising 200 synthetic and 48 real training videos, 11 validation videos, and 20 test videos.
Evaluation Protocol
Two tracks were defined: a subjective track evaluated by human raters on fidelity, perceptual quality, and temporal consistency; an objective track used PSNR, SSIM, LPIPS, MUSIQ, and WarpError metrics for synthetic and real videos.
Results
Out of 95 registered teams, 12 submitted valid results. RedMediaTech ranked first in both subjective and objective evaluations, achieving a subjective score of 3.8525 and objective scores of PSNR 30.7610, SSIM 0.8504, LPIPS 0.1910. The rankings highlighted the importance of combined subjective‑objective assessment.
Top‑3 Method Analyses
1. RedMediaTech (Xiaohongshu)
Core: Wan 2.1 diffusion transformer (DiT) in a single‑step diffusion framework.
Two‑stage training: Stage 1 used Wan 2.1 VAE + DiT with MSE+LPIPS loss; Stage 2 replaced the VAE with Qwen‑Image VAE to boost PSNR and SSIM.
Key features: shortcut connection between VAEs, 3D RoPE for temporal encoding, extensive data augmentation, and efficient single‑step inference.
Training: 8 × H20 GPUs for ~5 days (LR 5e‑5), then fine‑tuned at 2e‑5 for 1 day; additional 10k high‑resolution internal clips.
2. TaoMC2 (Alibaba Taobao Group + Beihang University)
Core: Text‑to‑video diffusion model with a two‑stage generative repair pipeline.
Stage 1: Dual‑branch repair—general real‑world restoration and a pre‑cleaning branch (supplemented by open‑source DOVE for the objective track).
Stage 2: RRDB‑based fusion network merging degraded input with intermediate outputs, using an anchor‑fusion strategy to balance artifact removal and detail preservation.
Training: CogVideoX1.5 backbone, 200 official synthetic videos + 500k YouTube/Pexels video‑text pairs, Qwen2.5‑VL generated captions, 64 × NVIDIA H20 GPUs, 49‑frame clips at 1024×1024 resolution.
3. STCVSR (Nanjing University of Science & Technology + Hunan University + OPPO Research)
Core: Pre‑trained STCDiT and ODTSR models forming a full‑video restoration pipeline.
ODTSR: Sparse anchor frames (1 per 25) provide structural guidance.
STCDiT: Single‑step diffusion in latent space captures cross‑segment motion cues.
Dynamic strategies: Skip anchor enhancement for dense‑texture videos, adjust segment boundaries for severely degraded frames.
4. Lucky One (Beihang University + Tsinghua University)
Core: Fine‑tuned CogVideoX with a single‑step diffusion video‑restoration approach.
Technique: Pixel‑level supervision in latent space (pixel‑wise loss) and potential‑pixel training, achieving up to 28× speed‑up over multi‑step diffusion.
Temporal consistency maintained via frame‑wise latent training.
Conclusions
The competition demonstrates rapid progress in generative‑model‑based video restoration, with single‑step diffusion pipelines offering a strong trade‑off between quality and efficiency. Combining subjective and objective metrics proved essential for comprehensive evaluation.
Acknowledgments
The challenge was co‑organized by the University of Science and Technology of China and Kuaishou, with support from national research funds and the Humboldt Foundation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
