Artificial Intelligence 9 min read

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

The article introduces FineVD, the first large‑scale multi‑dimensional UGC video quality dataset, and presents FineVQ, a unified model that predicts quality scores, attributes, and distortion types across six dimensions, achieving state‑of‑the‑art performance on multiple benchmarks and cross‑dataset evaluations.

Bilibili Tech

May 16, 2025

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

Background

With the explosion of user‑generated content (UGC) videos on social platforms, videos suffer from diverse degradations such as noise, blur, and jitter, making fine‑grained quality assessment a long‑standing challenge. Existing VQA models provide only overall scores, which are insufficient for detailed video processing and recommendation scenarios.

FineVD Dataset

In collaboration with Bilibili, researchers from Shanghai Jiao‑Tong University built FineVD, the world’s first large‑scale multi‑dimensional UGC video quality database. It contains 6,104 videos covering seven major categories (e.g., knowledge, music, daily life, animation, fashion, animals, sports) and six live‑streaming scenarios, with more than 800,000 human‑annotated quality scores and expert‑level distortion labels across twelve typical distortion types (compression artifacts, motion blur, focus issues, over‑sharpening, mosaic, stutter, etc.).

FineVQ Model

FineVQ is a unified framework that simultaneously outputs quality grades, numeric scores, and textual descriptions for six quality dimensions (color, noise, artifacts, blur, temporal, overall). The method consists of three steps:

Visual feature extraction: Extract image‑content features and temporal motion features from the video, as well as textual cues from user prompts.

Feature alignment and fusion: Project visual features into the textual space and concatenate them as input to a large language model.

Instruction‑tuned LLM: Fine‑tune the language model with LoRA to generate multi‑dimensional quality outputs.

This single‑run pipeline can evaluate any number of dimensions without re‑training separate models.

Experimental Results

FineVQ was evaluated on FineVD and eleven other UGC‑VQA benchmarks using SRCC, KRCC, and PLCC as core metrics. Across all datasets and all three metrics, FineVQ consistently outperformed traditional IQA/VQA methods and recent deep‑learning approaches, achieving the highest scores in every dimension.

Performance comparison of FineVQ and other methods on quality score prediction

Distortion Type Prediction

On FineVD, FineVQ also excelled at predicting distortion types, providing both binary “yes/no” judgments and specific “which” distortion labels, demonstrating superior diagnostic capability compared with the latest LMM baselines.

Evaluation on Other UGC‑VQA Datasets

FineVQ was further tested on six additional benchmarks (LIVEYT‑Gaming, KoNViD‑1k, YouTube‑UGC, LIVE‑VQ, LSVQ test, LSVQ 1080p). It achieved the best performance on all, improving over the previous state‑of‑the‑art KSVQE by 3.8% on LSVQ‑1080p, confirming its effectiveness for high‑resolution video quality assessment.

FineVQ results on other UGC‑VQA datasets

Cross‑Dataset Generalization

Two cross‑dataset experiments were conducted: (1) train on other datasets, test on FineVD; (2) train on FineVD, test on other datasets. FineVQ outperformed two leading VQA models in both settings, demonstrating strong generalization. Notably, models trained on FineVD performed better on external datasets, while the reverse yielded poorer results, highlighting the diversity and broad distribution of FineVD.

Resources

Paper: FineVQ: Fine‑Grained User Generated Content Video Quality Assessment (https://arxiv.org/pdf/2412.19238) – Project page: https://duanhuiyu.github.io/FineVQ-project-page/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision deep learning multimodal video quality assessment UGC Dataset FineVQ

Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.