Artificial Intelligence 15 min read

MD-VQA: Multi-Dimensional No-Reference Video Quality Assessment for CVPR NTIRE 2023

Alibaba’s Taobao VQA team won the CVPR NTIRE 2023 Video Enhancement Challenge by introducing MD‑VQA, a multi‑dimensional no‑reference video quality model that combines a Swin‑Transformer‑V2 spatial backbone, a pre‑trained SlowFast motion encoder, and a convolutional fusion module, pre‑trained on LSVQ, fine‑tuned on NTIRE data, and augmented spatio‑temporally, achieving state‑of‑the‑art SROCC and PLCC scores and now powering quality monitoring on Alibaba’s live‑streaming and short‑video services.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
MD-VQA: Multi-Dimensional No-Reference Video Quality Assessment for CVPR NTIRE 2023

Alibaba's Taobao audio‑video team (TB‑VQA) won the CVPR NTIRE 2023 Quality Assessment of Video Enhancement Challenge, the only track of the competition.

The challenge focuses on no‑reference video quality assessment (VQA) for 1,211 real‑world videos that have undergone various enhancement operations.

To address the task, the team proposed MD‑VQA, a multi‑dimensional VQA model that extracts spatial semantics with a Swin‑Transformer‑V2 backbone, captures motion information with a pre‑trained SlowFast network, and fuses spatial and temporal features through a convolutional fusion module before regressing a quality score.

Data augmentation is performed in both spatial and temporal dimensions, and the model is first pre‑trained on the large LSVQ dataset (38,811 videos) and then fine‑tuned on the NTIRE training set.

Experimental results on KoNViD‑1k and LIVE‑VQC show that MD‑VQA achieves higher SROCC and PLCC than existing state‑of‑the‑art methods. Ablation studies confirm the contributions of the Swin backbone, feature‑fusion design, spatio‑temporal augmentation, and large‑scale pre‑training.

The model has been deployed in Alibaba’s live‑streaming and short‑video platforms (Taobao Live, Douyin‑like services) to monitor and improve video quality in real time, and is also used in other Alibaba products such as DingTalk and Alipay live streams.

computer visiondeep learningmultimediaNo-ReferenceSwin Transformervideo quality assessment
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.