Douyin's Metric‑Driven Optimization: Boosting Creation Experience and Performance
This article details Douyin's systematic approach to improving creation experience by defining measurable goals, building a comprehensive metric system, performing relevance analysis, and implementing concrete Android and iOS performance optimizations—including album loading, component architecture, and small‑screen video quality—while outlining monitoring, tooling, and internal platform support that together deliver significant user‑facing gains.
Definition of Goals and Evaluation Methods
The business goal is to make Douyin's creation experience industry‑leading and increase submission rates. To evaluate "creation experience" and "industry‑leading" we need quantifiable data metrics and a reliable assessment plan.
We built a complete data‑metric system covering performance and experience indicators such as first‑frame capture, stutter, frame rate, panel loading times for music, stickers, etc., and constructed a contribution chain from creation experience → submission rate → DAU/retention. The selection and weighting of these technical metrics are explained in Section 2.
Evaluation combines offline comparative testing with automated pipelines, sampling device distribution across high/mid/low tiers, weighting the metrics to produce an overall experience score, and using standard practices such as stable environment, outlier removal, and weight adjustment.
Goal Decomposition and Correlation Analysis
We asked how the many metrics from Chapter 1 are selected and linked to business outcomes.
Metrics originate from brainstorming, user feedback, and experience from other businesses (e.g., first‑frame impact on e‑commerce conversion, frame rate impact on video‑app retention).
For submission‑rate improvement we decompose it similarly to e‑commerce order decomposition:
Submission Rate Increase = ∏(conversion‑rate improvements of each step) = permission‑grant conversion × page‑1 conversion × … × publish‑success conversion
Bottom‑up chemical decomposition yields a contribution chain such as:
First‑frame / frame‑rate → specific page conversion → submission rate
These metrics appear in A/B experiments; some experiments may not significantly boost submission rate but still improve intermediate conversions, reflecting business benefit and supporting full rollout.
We then perform correlation analysis to confirm each metric's impact on submission rate and prioritize them. Correlation indicates closeness, not causation.
The basic stepwise correlation chart shows that as first‑frame duration increases, page conversion drops, indicating correlation. Different intervals show varying correlation strength; for example, the 400‑1000 ms range has high correlation and high DAU share, making optimization there highly beneficial.
Online business benefit is estimated as:
Online Benefit = discount‑factor × Σ(correlation‑slope × (optimized‑segment – pre‑optimization‑segment))
Note: In practice, estimated benefit usually suffers a 0.5‑0.7 loss compared to actual online benefit; the loss varies per metric.
Quantile‑based correlation analysis uses percentile values on the X‑axis instead of fixed time buckets, making benefit calculation more straightforward.
Since correlation does not imply causation, for high‑impact metrics that require heavy resources we first conduct degradation experiments to confirm causality before investing in optimization.
Based on correlation strength, estimated benefit, and ROI, we rank metrics to guide project scheduling and assign different weights in competitor comparisons.
Optimization in Practice
3.1 Android Album Experience Optimization
Problem Discovery : The album is the first screen of the upload path, accounting for a large share of submissions, but suffers from long loading and slow cover rendering.
Optimization Plan
Original loading logic:
Optimizations per environment:
Replace Activity with Scene (ByteDance open‑source) to reduce empty‑Activity load time (>50 ms on mid‑tier devices).
Pre‑load and dynamically refresh data.
We pre‑load media data by reusing the camera page’s album icon query, reducing one data request.
When querying media data we use XXXColumns to fetch only necessary fields, as extra fields directly affect load speed.
public interface MediaColumns extends BaseColumns {
// ...
/** The MIME type of the file */
public static final String MIME_TYPE = "mime_type";
/** The height of the image/video in pixels. */
public static final String HEIGHT = "height";
}After entering the album page we first show cached album info, then refresh via DiffUtil on a background thread.
Lazy‑load tabs and cover images.
ViewPager loads adjacent pages by default; we override PagerAdapter so that off‑screen tabs load nothing until displayed.
public void setOffscreenPageLimit(int limit) {
// DEFAULT_OFFSCREEN_PAGES = 1
if (limit < DEFAULT_OFFSCREEN_PAGES) {
Log.w(TAG, "Requested offscreen page limit " + limit + " too small; defaulting to " + DEFAULT_OFFSCREEN_PAGES);
limit = DEFAULT_OFFSCREEN_PAGES;
}
if (limit != mOffscreenPageLimit) {
mOffscreenPageLimit = limit;
populate();
}
}Fresco handles cover loading. When a thumbnail exists we use it; otherwise we apply several optimizations:
Cache resized images for high‑resolution photos to speed up cover loading on low‑end devices.
Replace MediaMetadataRetriever frame extraction with a custom library and cache video thumbnails.
public class LocalVideoThumbnailProducer implements Producer<CloseableReference<CloseableImage>> {
@Override
protected CloseableReference<CloseableImage> getResult() throws Exception {
String path = getLocalFilePath(imageRequest);
Bitmap thumbnailBitmap = ThumbnailUtils.createVideoThumbnail(path, calculateKind(imageRequest));
return CloseableReference.<CloseableImage>of(new CloseableStaticBitmap(
thumbnailBitmap,
SimpleBitmapReleaser.getInstance(),
ImmutableQualityInfo.FULL_QUALITY, 0));
}
}Additional optimizations include product‑level tweaks (e.g., limiting album to 9 images), vendor‑specific cache APIs, RecycledViewPool reuse across tabs, proactive Fresco request release on low‑end devices, and Android Q adaptation.
Effect & Benefit
Offline Test (vivo NEX, 8k images, 665 videos):
Before optimization: first‑frame PCT50 –43%.
After optimization: significant increase in submission rate, start‑shoot rate, and average uploads per user.
3.2 iOS Component Performance Architecture Optimization
Problem Discovery : Component loading is performed serially on the main thread during page ready, blocking first‑frame rendering. As more components are added, the issue worsens.
Page Ready: all component loading completed, page is interactable. Component loading typically includes UI load, cache read, network request, Rx binding, etc.
Solution 1: Delayed Component Loading
We split component loading into UI‑first‑frame and time‑consuming operations. This cuts first‑frame time dramatically, but delayed component loading also postpones event binding, slightly extending overall Page Ready time.
Solution 2: Parallel Component and First‑Frame Loading
By leveraging the run‑loop, we schedule component loading during idle periods, allowing the first‑frame task to pre‑empt when needed.
- (void)registerTransactionMainRunloopObserver {
AssertMainThread();
if (self.runLoopObserver) return;
__auto_type runLoopCallback = ^(CFRunLoopObserverRef observer, CFRunLoopActivity activity) {
AssertMainThread();
[self loopTransaction];
};
CFRunLoopRef runLoop = CFRunLoopGetCurrent();
CFOptionFlags activities = (kCFRunLoopBeforeWaiting | kCFRunLoopExit);
self.runLoopObserver = CFRunLoopObserverCreateWithHandler(NULL, activities, YES, INT_MAX, runLoopCallback);
CFRunLoopAddObserver(runLoop, self.runLoopObserver, kCFRunLoopCommonModes);
}Benefits
First‑frame duration reduced by 30‑50% across device tiers and quantiles.
Strict first‑frame control prevents performance regression from new features.
3.3 Extreme Visual Quality on Small Screens
Background : Small screens, limited compute, unreliable networks, and thermal constraints make mobile visual quality far behind PC. Improving upload‑side video quality is essential.
Solution Overview
Hardware capabilities: camera stabilization, auto‑exposure, night mode; H264/bytevc1 hardware encode/decode; HDR.
Quality algorithms: super‑resolution, enhancement, denoising, frame interpolation, color grading.
Product strategies: full‑screen shooting, high‑resolution/bitrate, tiered strategy distribution, video pass‑through, quality evaluation, version‑based monitoring.
Algorithmic Enhancements : super‑resolution, denoising, frame interpolation, color correction.
Product Strategies
Full‑screen shooting: camera view fills the screen.
High‑resolution & bitrate increase.
Tiered strategy distribution based on device capability.
Video pass‑through: upload original high‑quality files without client transcoding.
Quality evaluation across scenarios to achieve optimal subjective results.
Version‑based monitoring of quality changes.
Benefit : Increased submission and consumption quality; high‑influence users (>10k followers) show significant submission‑rate uplift, while overall consumption time also rises.
Monitoring and Regression Prevention
Daily data collection and online alerts are essential for detecting performance or business anomalies. Proper threshold settings and multi‑dimensional classification enable rapid issue localization.
For example, a CDN failure in a region may increase prop‑tool download latency, leading to a drop in submission rate. By tracing relevant business and performance metrics we can pinpoint the faulty component or region.
Regression prevention is as important as optimization. New code can introduce performance degradation; a systematic anti‑regression system includes P0 MR checks, P1 daily checks, and version‑level comparisons.
Common External Tools
General (memory/CPU/network/latency): TraceView, Android Profiler, DTrace, Instruments, SignPost, Hook.
Page rendering/frame rate: Profile GPU Rendering, systrace.
Layout hierarchy: Hierarchy Viewer.
Memory/CPU analysis: Memory Analyzer Tool, LeakCanary, System Trace.
Powerful Internal Platforms
ByteDance provides numerous efficient platforms that greatly simplify work and turn seemingly impossible ideas into reality.
Automation testing and AI recognition enable weekly competitor comparisons instead of bi‑monthly manual QA.
ByteBench supplies device performance and compatibility data for tiered strategies.
Fast AB experiment platform offers hour‑level online data observation and flexible strategy adjustments.
Quality and simulation labs allow subjective evaluation of submission quality across scenarios.
Overseas rack platforms enable remote research within compliance, adapting strategies to regional network conditions.
Join Us to Challenge the World
Since mid‑2020 we have pursued an industry‑leading creation experience roadmap, growing from a handful of engineers to a large, cross‑functional team that achieved the set goals.
We believe that to make Douyin a world‑class mobile internet product, we must deliver industry‑leading technical performance and become world‑class engineers.
If you share this belief, join us to challenge the world.
Send your résumé to [email protected] (dual‑end performance optimization / video editing / automation, Beijing/Hangzhou) or [email protected] (audio‑video engine, multiple cities). Scan the QR code below to apply.
Positions: Dual‑end performance optimization / video editing / automation; Audio‑video engine.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ByteDance SE Lab
Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
