CVPR NTIRE 2026 UGC Short Video Restoration Challenge: Winning Solutions Revealed

The CVPR NTIRE 2026 challenge introduced the KwaiVIR benchmark for real‑world UGC short‑video degradation and hosted two tracks—subjective and objective—where 12 teams submitted results, with RedMediaTech achieving top scores across fidelity, perceptual quality, and temporal consistency, followed by detailed analyses of the leading methods.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
CVPR NTIRE 2026 UGC Short Video Restoration Challenge: Winning Solutions Revealed

Challenge Overview

The NTIRE 2026 challenge focused on restoring user‑generated short videos (UGC) under realistic degradation conditions. It introduced the KwaiVIR benchmark, comprising 200 synthetic and 48 real training videos, 11 validation videos, and 20 test videos.

Evaluation Protocol

Two tracks were defined: a subjective track evaluated by human raters on fidelity, perceptual quality, and temporal consistency; an objective track used PSNR, SSIM, LPIPS, MUSIQ, and WarpError metrics for synthetic and real videos.

Results

Out of 95 registered teams, 12 submitted valid results. RedMediaTech ranked first in both subjective and objective evaluations, achieving a subjective score of 3.8525 and objective scores of PSNR 30.7610, SSIM 0.8504, LPIPS 0.1910. The rankings highlighted the importance of combined subjective‑objective assessment.

Top‑3 Method Analyses

1. RedMediaTech (Xiaohongshu)

Core: Wan 2.1 diffusion transformer (DiT) in a single‑step diffusion framework.

Two‑stage training: Stage 1 used Wan 2.1 VAE + DiT with MSE+LPIPS loss; Stage 2 replaced the VAE with Qwen‑Image VAE to boost PSNR and SSIM.

Key features: shortcut connection between VAEs, 3D RoPE for temporal encoding, extensive data augmentation, and efficient single‑step inference.

Training: 8 × H20 GPUs for ~5 days (LR 5e‑5), then fine‑tuned at 2e‑5 for 1 day; additional 10k high‑resolution internal clips.

2. TaoMC2 (Alibaba Taobao Group + Beihang University)

Core: Text‑to‑video diffusion model with a two‑stage generative repair pipeline.

Stage 1: Dual‑branch repair—general real‑world restoration and a pre‑cleaning branch (supplemented by open‑source DOVE for the objective track).

Stage 2: RRDB‑based fusion network merging degraded input with intermediate outputs, using an anchor‑fusion strategy to balance artifact removal and detail preservation.

Training: CogVideoX1.5 backbone, 200 official synthetic videos + 500k YouTube/Pexels video‑text pairs, Qwen2.5‑VL generated captions, 64 × NVIDIA H20 GPUs, 49‑frame clips at 1024×1024 resolution.

3. STCVSR (Nanjing University of Science & Technology + Hunan University + OPPO Research)

Core: Pre‑trained STCDiT and ODTSR models forming a full‑video restoration pipeline.

ODTSR: Sparse anchor frames (1 per 25) provide structural guidance.

STCDiT: Single‑step diffusion in latent space captures cross‑segment motion cues.

Dynamic strategies: Skip anchor enhancement for dense‑texture videos, adjust segment boundaries for severely degraded frames.

4. Lucky One (Beihang University + Tsinghua University)

Core: Fine‑tuned CogVideoX with a single‑step diffusion video‑restoration approach.

Technique: Pixel‑level supervision in latent space (pixel‑wise loss) and potential‑pixel training, achieving up to 28× speed‑up over multi‑step diffusion.

Temporal consistency maintained via frame‑wise latent training.

Conclusions

The competition demonstrates rapid progress in generative‑model‑based video restoration, with single‑step diffusion pipelines offering a strong trade‑off between quality and efficiency. Combining subjective and objective metrics proved essential for comprehensive evaluation.

Acknowledgments

The challenge was co‑organized by the University of Science and Technology of China and Kuaishou, with support from national research funds and the Humboldt Foundation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

video quality assessmenttemporal consistencygenerative diffusion modelsKwaiVIR datasetNTIRE 2026single-step diffusionUGC video restoration
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.