How Douyin Boosts Video Quality End‑to‑End: From Capture to Playback

This article explains Douyin's comprehensive video‑quality pipeline, covering the impact of resolution, bit depth, frame rate, color gamut and brightness, the end‑to‑end processing chain from sensor to server to client, and the mix of subjective and objective evaluation methods used to continuously improve visual experience.

ByteDance SE Lab
ByteDance SE Lab
ByteDance SE Lab
How Douyin Boosts Video Quality End‑to‑End: From Capture to Playback

Background

Douyin, a short‑video platform with over 600 million daily active users and billions of video searches, faces constant challenges in balancing multimedia processing cost and user experience, prompting continuous research on visual quality.

Key Quality Dimensions

Resolution

Resolution is the pixel count of an image; higher resolution (e.g., 4K = 3840×2160) provides finer detail than standard HD.

Bit Depth

Bit depth determines the number of colors per pixel; 8‑bit displays ~16.7 million colors, while 10‑bit displays ~1.07 billion, enabling smoother gradients.

Frame Rate

Frame rate is the number of images shown per second; cinema typically uses 24 fps, TV uses 30 fps or 60 i, while 8K broadcast standards support up to 120 fps for near‑real‑world motion smoothness.

Color Gamut

Color gamut defines the range of reproducible colors; BT.2020 (used in 4K/8K) covers a wider range than BT.709 (HD).

Brightness

Brightness (dynamic range) spans the darkest to brightest perceivable intensities; HDR expands this range from 10³ to 10⁵, approaching human vision.

Douyin End‑to‑End Pipeline

The pipeline is complex and influences quality at every stage.

Capture and Sensor

Video is captured by the sensor, converted to electrical signals, then processed by the ISP, which adds multi‑frame HDR, super‑resolution, denoising, etc.; Android Camera1/2 APIs and vendor SDKs provide access to these capabilities.

Production‑Side Processing

Special‑effect SDKs, enhancement algorithms, editing SDKs, and upload SDKs apply beautification, denoising, and other enhancements before sending to the server.

Server‑Side Processing

The server analyses video metadata, applies enhancement pre‑processing, and transcodes into multiple bitrate tiers for CDN distribution.

Client‑Side Playback

The player performs demuxing, decoding, and further enhancement (e.g., HDR, super‑resolution) before rendering.

Evaluation Methods

Industry faces a gap between objective metrics and subjective perception; PSNR, while popular, may not reflect user experience, especially for localized artifacts.

Subjective quality is measured by human observers, but is time‑consuming; objective quality uses mathematical models to approximate perception.

Objective metrics are classified as full‑reference (e.g., PSNR), no‑reference, and partial‑reference.

Common Objective Metrics

MTF : Modulation Transfer Function evaluates lens or system resolution.

SNR : Signal‑to‑Noise Ratio (in dB) quantifies noise versus signal, but can be fooled by aggressive denoising.

CIE Lab ΔE : Color difference computed as sqrt(ΔL²+Δa²+Δb²); CIE 2000 refines this for better perceptual alignment.

delta_Eab = sqrt(delta_L.^2 + delta_a.^2 + delta_b.^2);
delta_Cab = sqrt(delta_a.^2 + delta_b.^2);

Laboratory & Tools

Douyin’s objective lab uses standard color charts (24‑color, SFRplus, ISO12233) and controlled lighting (D65, D50, A) to conduct repeatable tests.

Custom tools include 24‑color chart analysis, SFRplus algorithms, shake detection, video parameter analysis, and stutter analysis.

Quality Improvement Across the Chain

Production Side

Software ISP pipelines from camera manufacturers (Canon, Sony, Qualcomm, Apple, etc.) are enhanced with AI‑mixed algorithms; post‑processing modules (e.g., Almalence, Visidon, Megvii) add super‑resolution, denoising, and sharpening.

The Prism system provides SDKs for scene strategy, intelligent analysis, and enhancement (photo, video, temporal).

Server Side

Server performs video analysis, pre‑processing (super‑resolution, frame interpolation, low‑quality enhancement), and encoding; custom encoders have won multiple international codec competitions.

Client Side

Client playback integrates super‑resolution, HDR rendering, and adaptive bitrate selection to balance quality and performance.

Future Outlook

Refine no‑reference scoring for finer‑grained daily testing.

Develop more accurate subjective quantification methods.

Explore quality assessment for AR/VR/MR scenarios.

Establish comprehensive HDR testing frameworks.

Investigate evaluation for ultra‑high resolutions (8K, 120 fps).

Standardize beauty‑filter quality using combined subjective and objective metrics.

Recruitment

The Douyin Multimedia Evaluation Lab seeks image‑testing engineers to help improve visual quality for billions of users.

Image Processingvideo qualityAI enhancementDouyinmultimedia evaluation
ByteDance SE Lab
Written by

ByteDance SE Lab

Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.