Design and Implementation of Bilibili Video Template System
The Bilibili video template system combines theme‑based editing, a layered architecture, and a cross‑platform Protobuf protocol to enable PGC and UGC creators to produce, review, distribute, and consume richly‑featured videos with standardized media formats, modular plugins, AI integration, and robust quality assurance.
This document presents a comprehensive technical overview of Bilibili's video template system, covering its creation principles, evolution, architecture, protocol design, media handling, production and consumption workflows, componentization, and operational assurance.
1. Essence of Creation
From the perspective of editing tools, video creation can be broken down into three elements: theme, material, and editing. Themes define the category and style (e.g., games, movies, lifestyle). Materials are the video, audio, and image assets provided by users. Editing involves time, space, and effect adjustments such as trimming, copying, filters, transitions, and effects. Video templates encapsulate these three elements, fixing the theme and editing methods while allowing users to supply custom materials, thereby lowering the creation barrier.
2. Evolution of Bilibili Video Templates
Bilibili has iteratively enhanced its template capabilities beyond basic subtitles, effects, and stickers to include AI drawing, 3D camera movements, and other high‑level functions, supporting more vertical use cases than competitors.
2.1 PGC vs. UGC
Templates are classified as PGC (produced by internal designers or vendors using custom tools) or UGC (created by any user via the Must‑Cut app). A comparison table highlights differences in production capacity, cost, atomic capabilities, and overall effect quality.
2.2 Transition from PGC to UGC
From 2021 to 2023, the template protocol was refined and gradually migrated from a PGC‑centric production model to a UGC‑centric model.
3. Lifecycle & Workflow
The template lifecycle consists of five stages: Production, Review, Distribution, Consumption, and Submission. Each stage is described in detail, and a workflow diagram (omitted) illustrates the process.
4. Architecture Design
The system is organized into three layers: Business Layer (specific features such as template configuration pages), Common Business Layer (shared capabilities like the main editor engine, upload component, and template protocol), and Base Component Layer (platform‑agnostic atomic capabilities such as clipping SDK, inference engine, and ASR services).
5. Protocol Specification
The template protocol defines theme, material, and editing methods. It must be cross‑platform, extensible, and efficiently transmittable. A comparison between JSON and Protobuf shows that Protobuf offers language neutrality, high compression, fast encoding/decoding, and better maintainability across platforms.
Example Protobuf definitions:
message TimeLine {
string id = 1;
// Timeline configuration: resolution, frame rate, bitrate
TimeLineConfig config = 2;
// Video tracks
repeated VideoTrack videoTracks = 3;
// Audio tracks
repeated AudioTrack audioTracks = 4;
}
message VideoTrack {
string id = 1;
// Transition clips
repeated VideoTransition transitions = 2;
// Video clips
repeated VideoClip clips = 3;
}
message VideoClip {
string id = 1;
// In‑point & out‑point of the clip
int64 inPoint = 2;
int64 outPoint = 3;
// Material ID
LocalPath materialId = 4;
}The protocol forms a tree structure: one timeline contains multiple video/audio/subtitle tracks; each track contains multiple clips; each clip stores core information such as in/out points and material IDs.
6. Media Format Regulations
Supported media types and their container/codec specifications are listed (e.g., video: MP4/MOV/WMV with H264/H265; audio: MP3/FLAC/AAC with MP3/AAC/PCM; images: JPG/PNG/GIF). Local media must be transcoded before packaging to reduce bandwidth and storage costs. Transcoding specifications for video (H264, VBR, unchanged resolution/frame rate) and audio (AAC, 128 kbps) are provided.
Cloud transcoding further standardizes formats (e.g., video: H264, CRF 25; audio: AAC, 128 kbps; images: JPG with quality factor 0.8).
7. Template ZIP Package Design
The ZIP package includes the draft.pb protocol file and all local assets (video, image, audio). A directory diagram (omitted) illustrates the structure.
8. Production & Consumption Subsystems
Version compatibility is handled by recording each template's atomic capabilities and matching them against the capabilities reported by the requesting app. This enables selective distribution of templates based on app version support.
A material‑template relationship graph ensures that when a material is withdrawn from the asset library, affected templates can be identified and handled appropriately.
Platform differences (e.g., transition types, coordinate systems, SDK implementations) are mitigated by enforcing a unified protocol, providing compatibility layers, and using glue code to bridge gaps.
AI‑generated content (AIGC) integration is achieved via a pre‑processing stage: the PB file is transformed into a TimeLine model, processed (e.g., AI drawing), and then fed into the editor engine. This allows asynchronous services such as AI drawing, 3D camera moves, and highlight detection to be incorporated into templates.
9. Componentization
The template business is modularized using flavor‑based plugins, allowing selective integration into different hosts (e.g., main for all platforms, bcut for Must‑Cut app, bilibili for Bilibili app). Build configurations and plugin diagrams (omitted) are provided.
10. Assurance
A two‑stage review system combines automated machine inspection (using PSNR and CV algorithms) with manual checks to ensure functional, performance, and safety standards. The automated stage filters out obviously defective templates, while the manual stage validates visual quality.
A comprehensive monitoring system tracks key metrics across the template lifecycle, including production success rates and latency, distribution latency, and consumption success rates. Data is stored in offline tables (Hive) and visualized via dashboards (images omitted).
11. Development Outlook
Vertical derivatives such as game highlights, Genshin Impact events, and daily mood cards leverage the template protocol for customized experiences. Future directions include deeper cloud‑draft integration, expanded AI services, and continued optimization of the template ecosystem.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.