Douyin’s BDVQAGroup Secures Global Runner‑Up in DXOMARK Image Quality Challenge at CVPR 2024
At CVPR 2024 NTIRE, Douyin’s BDVQAGroup achieved second place worldwide in the DXOMARK portrait quality track using their SampleIQA model, which combines data‑re‑sampling, a Swin‑Transformer backbone, twin‑network ranking loss and content‑aware cropping to outperform existing IQA state‑of‑the‑art methods.
Competition Overview
The CVPR 2024 NTIRE competition, organized by ETH Zurich’s Computer Vision Laboratory, featured several low‑level vision tracks, including Portrait Quality Assessment (DXOMARK) and Short‑form UGC Video Quality Assessment (Kuaishou). Hundreds of teams from industry and academia—Huawei, Meituan, Zhejiang University, Tsinghua, Peking University, etc.—participated.
Dataset Details
Portrait track used the PIQ23 dataset: 5,116 portrait images captured by over 100 smartphone models for training, with a test set drawn from the same distribution. The video track employed the KVQ dataset, containing 4,200 UGC videos (600 raw uploads) degraded into seven quality levels, yielding 3,600 degraded clips. MOS scores (1–5, step 0.5) were provided by 15 professional annotators, and both same‑source and cross‑source video pairs were annotated for ranking.
Research Background
UGC images and videos often suffer from inconsistent subjective quality due to non‑professional capture conditions and limited processing pipelines. Traditional IQA methods focus on overall fidelity (sharpness, color, noise) while portrait and video quality assessment must also consider facial detail, skin tone, expression naturalness, background contrast, and aesthetic factors. These tasks remain challenging in terms of model size, generalization, and accuracy.
Solution Overview
The BDVQAGroup designed SampleIQA , a data‑re‑sampling and Vision Transformer‑based IQA model. To address the imbalanced score distribution in PIQ23 (many medium scores, few extremes), they formulated a mixed‑integer linear programming (MILP) scheme that reshapes the training data to follow a target distribution (e.g., Gaussian), ensuring balanced sampling across score ranges.
SampleIQA employs a Swin‑Transformer backbone and a twin‑network architecture: two images are fed simultaneously, enabling pairwise ranking supervision. The loss combines mean‑squared error (MSE) for MOS regression and a rank loss to enforce monotonicity. During training, a content‑aware random cropping strategy extracts 448×448 patches while guaranteeing full facial coverage; if a crop misses the face, the process retries.
Experimental Results
On the PIQ23 test set, SampleIQA surpassed existing IQA state‑of‑the‑art methods. It achieved the highest PLCC score and second‑best SROCC and KRCC scores, securing the overall runner‑up position in the DXOMARK portrait track. In the KVQ video track, the team placed second in several sub‑metrics and ranked within the top five overall.
Key metric table (excerpt):
PLCC: 1st place
SROCC: 2nd place
KRCC: 2nd place
Conclusion
The results demonstrate that a combination of principled data re‑sampling, Swin‑Transformer feature extraction, and pairwise ranking loss can significantly improve portrait and short‑form video quality assessment. The BDVQAGroup’s success against strong competitors such as Huawei and leading universities validates the effectiveness of SampleIQA and highlights Douyin’s commitment to advancing user‑experience quality.
References
[1] Competition website: https://codalab.lisn.upsaclay.fr/competitions/17311<br/>[2] UGC challenge report (PDF)<br/>[3] DXOMARK challenge report (PDF)
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AIWalker
Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
