Artificial Intelligence 25 min read

Xiaohongshu Audio-Video Architecture Team Wins Top Awards in CVPR NTIRE 2024 Challenges

Xiaohongshu’s audio‑video architecture team secured second place in the RAIM challenge and first in the S‑UGC VQA challenge at CVPR NTIRE 2024 by improving generative image restoration with SUPIR, DeSRA and a Fusion model, and enhancing video quality assessment using LIQE, Q‑Align and FAST‑VQA, then deploying these methods for live‑stream denoising, intelligent transcoding and cloud‑based super‑resolution, achieving high PLCC/SROCC scores and up to 33 % bandwidth savings.

Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Xiaohongshu Audio-Video Architecture Team Wins Top Awards in CVPR NTIRE 2024 Challenges

Xiaohongshu's audio-video architecture image algorithm team achieved remarkable results in the CVPR NTIRE 2024 challenges, securing second place in the Restore Any Image Model in the Wild (RAIM) challenge and first place in the Short-form UGC Video Quality Assessment (S-UGC VQA) challenge.

For RAIM, the team employed the state-of-the-art generative restoration model SUPIR as a base, identified its distortion regions using the DeSRA method, and refined those regions with a Fusion model that combines the original image, SUPIR output, and a binary mask to improve fidelity while preserving generative priors.

In the S-UGC VQA challenge, they built upon the SimpleVQA baseline, integrating LIQE, Q-Align, and FAST-VQA features to enhance spatial and temporal quality-aware representations, resulting in PLCC and SROCC scores above 0.9 and a clear margin over baseline methods.

These algorithms have been transferred to Xiaohongshu's video pipeline: live‑stream denoising is performed right after camera capture to improve upstream quality and encoding efficiency; cloud‑side intelligent transcoding uses quality‑aware bitrate prediction and adaptive sharpening/denoising to balance quality and bandwidth; and an end‑cloud super‑resolution scheme leverages cloud‑enhanced inputs and bitrate‑quality prediction to deliver higher perceived resolution while saving up to 33% of downstream bandwidth.

Overall, the work demonstrates how cutting‑edge image and video processing research can be engineered for large‑scale social‑media platforms, delivering measurable QoE/QoS improvements and cost savings.

AIDeep Learningvideo quality assessmentSuper-ResolutionXiaohongshuCVPR NTIRE 2024denoisingimage restoration
Xiaohongshu Tech REDtech
Written by

Xiaohongshu Tech REDtech

Official account of the Xiaohongshu tech team, sharing tech innovations and problem insights, advancing together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.