Comprehensive Video Quality Evaluation Practices at Bilibili: From Subjective and Objective Metrics to HDR Assessment
Bilibili’s comprehensive video‑quality framework merges ITU‑R BT.500 subjective MOS testing with objective metrics such as PSNR, SSIM, VMAF and NIQE—including HDR10 and HLG assessment—through a full capture‑to‑encoding workflow, addressing alignment and content‑specific challenges and delivering measurable QoE and speed gains across its platforms.
Background: With the rapid growth of short‑video platforms, user‑generated content (UGC) has become ubiquitous, raising new demands for video creation tools and quality evaluation. Traditional manual assessment cannot keep up with the volume, so a more efficient quality assessment system is needed.
Evaluation approach: Bilibili built a quality assessment framework that combines subjective (QoE) and objective (QoS) metrics. Subjective testing follows ITU‑R BT.500 standards, using Mean Opinion Score (MOS) collected from a controlled environment. Objective metrics include full‑reference (PSNR, SSIM, VMAF), partial‑reference, and no‑reference (NIQE) methods.
Key objective metrics:
• PSNR (Peak Signal‑to‑Noise Ratio) measures pixel‑wise error; higher dB indicates better quality but ignores perceptual factors.
• SSIM (Structural Similarity Index) evaluates luminance, contrast and structure similarity, ranging from 0 to 1.
• VMAF (Video Multi‑Method Assessment Fusion) is a machine‑learning based metric from Netflix that combines spatial and temporal features.
• NIQE (Natural Image Quality Evaluator) is a blind, no‑reference metric based on natural scene statistics; lower scores indicate better quality.
Practical workflow: The process starts with scenario definition (camera type, lighting, motion, device configuration), followed by video capture, preprocessing, encoding, and finally both subjective and objective evaluation. The overall pipeline is illustrated in Figure 5.
Code example for video frame processing (used in the objective pipeline):
def process_video(filename=0, func=None, output='result.mp4', verbose=0):
"""处理视频
:param filename: 视频源,默认为摄像头
:param func: 处理每一帧的函数名
:param output: 保存的文件名
:param verbose: 可视化,0不可视化,1显示处理后的结果,2显示对比结果
"""
cap = cv2.VideoCapture(filename) # 打开摄像头
fourcc = cv2.VideoWriter_fourcc(*'MP4V') # 视频编解码器
fps = cap.get(cv2.CAP_PROP_FPS) # 帧数
width, height = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) # 宽高
out = cv2.VideoWriter(output, fourcc, fps, (width, height)) # 写入视频
if verbose > 0 or filename == 0:
print('英文下输入q停止')
count = cap.get(cv2.CAP_PROP_FRAME_COUNT) # 总帧数
accum_time = 0 # 累计时间
curr_fps = 0 # 当前帧数
prev_time = timer() # 上一段的时间
while cap.isOpened():
if count > 0:
current = int(cap.get(cv2.CAP_PROP_POS_FRAMES) + 0.5) # 当前第几帧
curr_time = timer() # 当前时间
exec_time = curr_time - prev_time # 处理时间
prev_time = curr_time # 上一段的时间设为当前时间
accum_time += exec_time # 累计时间
curr_fps += 1
if accum_time >= 1:
accum_time -= 1
print('进度:{:.2f}%\tFPS:{}'.format(current / count * 100, curr_fps))
curr_fps = 0 # 重置帧数
ret, frame = cap.read()
if ret == True:
result = func(frame, current) if func else frame
out.write(result) # 写入帧
if verbose > 0 or filename == 0:
cv2.imshow('after', result)
if verbose == 2:
cv2.imshow('before', frame)
if cv2.waitKey(1) & 0xFF == ord('q'): # q退出
break
else:
break
cap.release()
out.release()
cv2.destroyAllWindows()Issues encountered and solutions:
Subjective testing variance was reduced by controlling participant numbers (4‑40), mixing roles (operations, product, R&D, testing) and standardizing viewing conditions per ITU‑R BT.1788.
Sample diversity was increased (different lighting, motion, device configurations) and confidence weighting was applied using ITU‑R BT.500‑13.
Objective metrics struggled with filtered or effect‑rich content; NIQE was adapted to evaluate key frames for such cases.
Frame alignment problems were addressed by preprocessing videos into image sequences, normalizing resolution, filling missing frames, and then applying VMAF/PSNR on aligned frames.
HDR assessment: HDR10 (PQ) and HLG transfer functions were evaluated. The workflow includes color‑space conversion, BT.2100 gamma curves, and both subjective viewing on HDR‑capable displays and objective scoring using VMAF/NIQE.
Overall results: The combined subjective‑objective framework improved perceived video quality and encoding speed across multiple Bilibili products (e.g., Bilibili Capture, Bilibili Editing). Quantitative gains are shown in Figures 18‑19.
Conclusion and outlook: The current framework provides a practical QoE‑driven evaluation pipeline, but further work is needed to extend testing to live streaming scenarios, enrich test scenarios, and integrate network‑level metrics.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.