How Douyin Boosts Video Quality End‑to‑End: From Capture to Playback
This article explains Douyin's comprehensive video‑quality pipeline, covering the impact of resolution, bit depth, frame rate, color gamut and brightness, the end‑to‑end processing chain from sensor to server to client, and the mix of subjective and objective evaluation methods used to continuously improve visual experience.
Background
Douyin, a short‑video platform with over 600 million daily active users and billions of video searches, faces constant challenges in balancing multimedia processing cost and user experience, prompting continuous research on visual quality.
Key Quality Dimensions
Resolution
Resolution is the pixel count of an image; higher resolution (e.g., 4K = 3840×2160) provides finer detail than standard HD.
Bit Depth
Bit depth determines the number of colors per pixel; 8‑bit displays ~16.7 million colors, while 10‑bit displays ~1.07 billion, enabling smoother gradients.
Frame Rate
Frame rate is the number of images shown per second; cinema typically uses 24 fps, TV uses 30 fps or 60 i, while 8K broadcast standards support up to 120 fps for near‑real‑world motion smoothness.
Color Gamut
Color gamut defines the range of reproducible colors; BT.2020 (used in 4K/8K) covers a wider range than BT.709 (HD).
Brightness
Brightness (dynamic range) spans the darkest to brightest perceivable intensities; HDR expands this range from 10³ to 10⁵, approaching human vision.
Douyin End‑to‑End Pipeline
The pipeline is complex and influences quality at every stage.
Capture and Sensor
Video is captured by the sensor, converted to electrical signals, then processed by the ISP, which adds multi‑frame HDR, super‑resolution, denoising, etc.; Android Camera1/2 APIs and vendor SDKs provide access to these capabilities.
Production‑Side Processing
Special‑effect SDKs, enhancement algorithms, editing SDKs, and upload SDKs apply beautification, denoising, and other enhancements before sending to the server.
Server‑Side Processing
The server analyses video metadata, applies enhancement pre‑processing, and transcodes into multiple bitrate tiers for CDN distribution.
Client‑Side Playback
The player performs demuxing, decoding, and further enhancement (e.g., HDR, super‑resolution) before rendering.
Evaluation Methods
Industry faces a gap between objective metrics and subjective perception; PSNR, while popular, may not reflect user experience, especially for localized artifacts.
Subjective quality is measured by human observers, but is time‑consuming; objective quality uses mathematical models to approximate perception.
Objective metrics are classified as full‑reference (e.g., PSNR), no‑reference, and partial‑reference.
Common Objective Metrics
MTF : Modulation Transfer Function evaluates lens or system resolution.
SNR : Signal‑to‑Noise Ratio (in dB) quantifies noise versus signal, but can be fooled by aggressive denoising.
CIE Lab ΔE : Color difference computed as sqrt(ΔL²+Δa²+Δb²); CIE 2000 refines this for better perceptual alignment.
delta_Eab = sqrt(delta_L.^2 + delta_a.^2 + delta_b.^2);
delta_Cab = sqrt(delta_a.^2 + delta_b.^2);Laboratory & Tools
Douyin’s objective lab uses standard color charts (24‑color, SFRplus, ISO12233) and controlled lighting (D65, D50, A) to conduct repeatable tests.
Custom tools include 24‑color chart analysis, SFRplus algorithms, shake detection, video parameter analysis, and stutter analysis.
Quality Improvement Across the Chain
Production Side
Software ISP pipelines from camera manufacturers (Canon, Sony, Qualcomm, Apple, etc.) are enhanced with AI‑mixed algorithms; post‑processing modules (e.g., Almalence, Visidon, Megvii) add super‑resolution, denoising, and sharpening.
The Prism system provides SDKs for scene strategy, intelligent analysis, and enhancement (photo, video, temporal).
Server Side
Server performs video analysis, pre‑processing (super‑resolution, frame interpolation, low‑quality enhancement), and encoding; custom encoders have won multiple international codec competitions.
Client Side
Client playback integrates super‑resolution, HDR rendering, and adaptive bitrate selection to balance quality and performance.
Future Outlook
Refine no‑reference scoring for finer‑grained daily testing.
Develop more accurate subjective quantification methods.
Explore quality assessment for AR/VR/MR scenarios.
Establish comprehensive HDR testing frameworks.
Investigate evaluation for ultra‑high resolutions (8K, 120 fps).
Standardize beauty‑filter quality using combined subjective and objective metrics.
Recruitment
The Douyin Multimedia Evaluation Lab seeks image‑testing engineers to help improve visual quality for billions of users.
ByteDance SE Lab
Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
