Artificial Intelligence 27 min read

BILIVQA: Bilibili's No-Reference Video Quality Assessment System

BILIVQA is Bilibili’s deep‑learning, no‑reference video quality assessment system that trains on a proprietary 5,000‑video UGC dataset, extracts spatial and temporal features via MobileNet‑V2 and X3D, uses mixed‑dataset regression for strong generalization, and deploys a GPU‑optimized TensorRT pipeline with percentile‑based scoring for reliable quality monitoring and downstream applications.

Bilibili Tech

Sep 29, 2023

BILIVQA: Bilibili's No-Reference Video Quality Assessment System

Video quality assessment (VQA) is crucial for ensuring good user experience on video platforms, as it evaluates perceptual quality of videos across production, transcoding, and consumption stages.

The article distinguishes between reference VQA (e.g., PSNR, SSIM, VMAF) that requires a pristine reference video, and no‑reference VQA, which is essential when the original video is unavailable. Traditional no‑reference methods rely on handcrafted features and SVMs, while deep learning‑based approaches automatically learn features from data.

Bilibili developed BILIVQA, a deep learning no‑reference VQA model tailored to its diverse user‑generated content (UGC) which includes various genres, distortions, and formats. Public datasets such as LIVE‑VQC, KoNViD‑1k, and LSVQ are insufficient due to distribution mismatch, prompting Bilibili to build its own UGC video dataset of about 5,000 videos with MOS scores.

The model samples each video into clips: one spatial key frame per clip and 32 consecutive frames for temporal information. Spatial features are extracted by a MobileNet V2 pretrained on ImageNet, temporal features by an X3D Net pretrained on Kinetics‑400. Features are concatenated and fed into a prediction network with pooling and regression layers to output a clip score; the final video score is the average of clip scores.

Training uses a mixed‑dataset strategy: batches contain equal numbers of LSVQ and Bilibili videos, sharing feature extractors but employing separate regression heads. This improves generalization on Bilibili’s test set.

Performance is measured with PLCC (linear correlation) and SROCC (rank correlation). BILIVQA shows high accuracy on Bilibili’s own dataset and strong generalization on public benchmarks.

For efficient deployment, Bilibili implemented a pure‑hardware GPU pipeline: video decoding via GPU hardware, frame extraction, CUDA‑based resizing, and inference with a TensorRT‑optimized model, boosting GPU utilization.

To enable stable long‑term monitoring, Bilibili introduced the “BILIBILIVQA 质量量纲” mapping mechanism. A large unbiased benchmark set of 150,000 popular Bilibili videos is used to convert raw model scores into percentile‑based stable scores, ensuring that monitoring panels reflect true quality trends despite model updates.

Subjective experiments linked score intervals to perceived quality (poor, fair, excellent), providing actionable thresholds for low‑quality alerts, recommendation weighting, and quality‑guided processing.

The work concludes with plans to enlarge the UGC dataset, refine sampling strategies, and explore VQA applications in recommendation, encoding control, and video processing pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep learning video quality assessment BILIVQA Model Engineering no-reference VQA

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.