Winning the NTIRE 2025 UGC Video Enhancement Challenge: A Progressive AI Framework

Tencent’s TEG team secured first place in the NTIRE 2025 UGC Video Enhancement competition by introducing a progressive, three‑stage AI framework that decomposes enhancement tasks into expert models for color correction, denoising, and temporal stability, incorporates advanced loss functions, extensive hardware‑level optimizations, INT8 quantization techniques, and outlines future diffusion‑based generative enhancements.

Tencent Technical Engineering
Tencent Technical Engineering
Tencent Technical Engineering
Winning the NTIRE 2025 UGC Video Enhancement Challenge: A Progressive AI Framework

Competition Overview

The CVPR NTIRE 2025 UGC Video Enhancement competition attracted top teams from Tencent, ByteDance, Alibaba and others. Tencent’s TEG (Tencent Engineering Group) team won the championship by submitting a solution that significantly improves video quality for user‑generated content.

NTIRE 2025 competition logo
NTIRE 2025 competition logo

Algorithm Framework

The proposed method follows a progressive training strategy divided into three stages:

Stage 1 – Color Enhancement : Uses a CLUT‑based network with MobileNetV3 to predict a 64×64×64 lookup table, providing adaptive color correction and exposure‑white‑balance adjustment.

Stage 2 – Denoising : A lightweight U‑Net built with RepVGG blocks removes sensor noise and compression artifacts while keeping inference speed above 300 FPS on an NVIDIA Titan RTX.

Stage 3 – Temporal Stability : Employs a RepVGG‑based module and RAFT optical‑flow to align consecutive frames, followed by a SwinIR‑style RSTB for temporal consistency and detail restoration.

A joint loss combines an AI‑encoder bitrate constraint (R) with a CoherenceLoss that enforces multi‑frame temporal consistency.

Results

The solution achieved the highest subjective scores on both public and private test sets, with an 81 % win rate over the raw input and a 92 % pairwise victory rate against all other submissions. The final ranking tables and pairwise win‑rate matrices demonstrate the clear superiority of the proposed approach.

Final ranking table
Final ranking table

Hardware Optimizations

To meet real‑time constraints, extensive low‑level optimizations were applied:

Assembly‑level kernel redesign reduced memory‑bandwidth usage by ~50 %.

Custom implicit GEMM kernels with REG BN Expand and Space Fusion increased compute efficiency.

Just‑In‑Time (JIT) compilation allowed per‑hardware tuning of block sizes, pipeline depth, and fusion strategies, yielding 2.5–3.3× speedup over CUTLASS and cuDNN.

Overall inference time dropped from 5.37 ms to 4.47 ms, a 19.8 % acceleration.

Operator performance comparison
Operator performance comparison

Quantization and Future Directions

INT8 quantization was achieved with less than 1 % accuracy loss by combining full‑precision teacher distillation, Local Adaptive Distillation (LAD) that emphasizes high‑frequency regions, and Hierarchical Feature Distillation (HFD) across multiple layers. An INT8 data‑flow alignment strategy ensures full operator fusion.

Future work will explore diffusion‑model based generative enhancement, integrating ControlNet and LoRA for richer low‑resolution priors, and leveraging large‑scale data to improve stability and fidelity.

Diffusion model enhancement example
Diffusion model enhancement example
AIQuantizationDiffusion Modelsvideo enhancementHardware Optimization
Tencent Technical Engineering
Written by

Tencent Technical Engineering

Official account of Tencent Technology. A platform for publishing and analyzing Tencent's technological innovations and cutting-edge developments.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.