Artificial Intelligence 10 min read

Douyin’s BDVQAGroup Secures Global Runner‑Up in DXOMARK Image Quality Challenge at CVPR 2024

At CVPR 2024 NTIRE, Douyin’s BDVQAGroup achieved second place worldwide in the DXOMARK portrait quality track using their SampleIQA model, which combines data‑re‑sampling, a Swin‑Transformer backbone, twin‑network ranking loss and content‑aware cropping to outperform existing IQA state‑of‑the‑art methods.

AIWalker

Feb 9, 2025

Douyin’s BDVQAGroup Secures Global Runner‑Up in DXOMARK Image Quality Challenge at CVPR 2024

Competition Overview

The CVPR 2024 NTIRE competition, organized by ETH Zurich’s Computer Vision Laboratory, featured several low‑level vision tracks, including Portrait Quality Assessment (DXOMARK) and Short‑form UGC Video Quality Assessment (Kuaishou). Hundreds of teams from industry and academia—Huawei, Meituan, Zhejiang University, Tsinghua, Peking University, etc.—participated.

Dataset Details

Portrait track used the PIQ23 dataset: 5,116 portrait images captured by over 100 smartphone models for training, with a test set drawn from the same distribution. The video track employed the KVQ dataset, containing 4,200 UGC videos (600 raw uploads) degraded into seven quality levels, yielding 3,600 degraded clips. MOS scores (1–5, step 0.5) were provided by 15 professional annotators, and both same‑source and cross‑source video pairs were annotated for ranking.

Research Background

UGC images and videos often suffer from inconsistent subjective quality due to non‑professional capture conditions and limited processing pipelines. Traditional IQA methods focus on overall fidelity (sharpness, color, noise) while portrait and video quality assessment must also consider facial detail, skin tone, expression naturalness, background contrast, and aesthetic factors. These tasks remain challenging in terms of model size, generalization, and accuracy.

Solution Overview

The BDVQAGroup designed SampleIQA , a data‑re‑sampling and Vision Transformer‑based IQA model. To address the imbalanced score distribution in PIQ23 (many medium scores, few extremes), they formulated a mixed‑integer linear programming (MILP) scheme that reshapes the training data to follow a target distribution (e.g., Gaussian), ensuring balanced sampling across score ranges.

SampleIQA employs a Swin‑Transformer backbone and a twin‑network architecture: two images are fed simultaneously, enabling pairwise ranking supervision. The loss combines mean‑squared error (MSE) for MOS regression and a rank loss to enforce monotonicity. During training, a content‑aware random cropping strategy extracts 448×448 patches while guaranteeing full facial coverage; if a crop misses the face, the process retries.

Experimental Results

On the PIQ23 test set, SampleIQA surpassed existing IQA state‑of‑the‑art methods. It achieved the highest PLCC score and second‑best SROCC and KRCC scores, securing the overall runner‑up position in the DXOMARK portrait track. In the KVQ video track, the team placed second in several sub‑metrics and ranked within the top five overall.

Key metric table (excerpt):

PLCC: 1st place

SROCC: 2nd place

KRCC: 2nd place

Conclusion

The results demonstrate that a combination of principled data re‑sampling, Swin‑Transformer feature extraction, and pairwise ranking loss can significantly improve portrait and short‑form video quality assessment. The BDVQAGroup’s success against strong competitors such as Huawei and leading universities validates the effectiveness of SampleIQA and highlights Douyin’s commitment to advancing user‑experience quality.

References

[1] Competition website: https://codalab.lisn.upsaclay.fr/competitions/17311<br/>[2] UGC challenge report (PDF)<br/>[3] DXOMARK challenge report (PDF)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision deep learning image quality assessment Vision Transformer DXOMARK NTIRE2024 SampleIQA

Written by

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.