Artificial Intelligence 20 min read

Design and Implementation of Bilibili Video Template System

The Bilibili video template system combines theme‑based editing, a layered architecture, and a cross‑platform Protobuf protocol to enable PGC and UGC creators to produce, review, distribute, and consume richly‑featured videos with standardized media formats, modular plugins, AI integration, and robust quality assurance.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Design and Implementation of Bilibili Video Template System

This document presents a comprehensive technical overview of Bilibili's video template system, covering its creation principles, evolution, architecture, protocol design, media handling, production and consumption workflows, componentization, and operational assurance.

1. Essence of Creation

From the perspective of editing tools, video creation can be broken down into three elements: theme, material, and editing. Themes define the category and style (e.g., games, movies, lifestyle). Materials are the video, audio, and image assets provided by users. Editing involves time, space, and effect adjustments such as trimming, copying, filters, transitions, and effects. Video templates encapsulate these three elements, fixing the theme and editing methods while allowing users to supply custom materials, thereby lowering the creation barrier.

2. Evolution of Bilibili Video Templates

Bilibili has iteratively enhanced its template capabilities beyond basic subtitles, effects, and stickers to include AI drawing, 3D camera movements, and other high‑level functions, supporting more vertical use cases than competitors.

2.1 PGC vs. UGC

Templates are classified as PGC (produced by internal designers or vendors using custom tools) or UGC (created by any user via the Must‑Cut app). A comparison table highlights differences in production capacity, cost, atomic capabilities, and overall effect quality.

2.2 Transition from PGC to UGC

From 2021 to 2023, the template protocol was refined and gradually migrated from a PGC‑centric production model to a UGC‑centric model.

3. Lifecycle & Workflow

The template lifecycle consists of five stages: Production, Review, Distribution, Consumption, and Submission. Each stage is described in detail, and a workflow diagram (omitted) illustrates the process.

4. Architecture Design

The system is organized into three layers: Business Layer (specific features such as template configuration pages), Common Business Layer (shared capabilities like the main editor engine, upload component, and template protocol), and Base Component Layer (platform‑agnostic atomic capabilities such as clipping SDK, inference engine, and ASR services).

5. Protocol Specification

The template protocol defines theme, material, and editing methods. It must be cross‑platform, extensible, and efficiently transmittable. A comparison between JSON and Protobuf shows that Protobuf offers language neutrality, high compression, fast encoding/decoding, and better maintainability across platforms.

Example Protobuf definitions:

message TimeLine {
    string id = 1;
    // Timeline configuration: resolution, frame rate, bitrate
    TimeLineConfig config = 2;
    // Video tracks
    repeated VideoTrack videoTracks = 3;
    // Audio tracks
    repeated AudioTrack audioTracks = 4;
}

message VideoTrack {
    string id = 1;
    // Transition clips
    repeated VideoTransition transitions = 2;
    // Video clips
    repeated VideoClip clips = 3;
}

message VideoClip {
    string id = 1;
    // In‑point & out‑point of the clip
    int64 inPoint = 2;
    int64 outPoint = 3;
    // Material ID
    LocalPath materialId = 4;
}

The protocol forms a tree structure: one timeline contains multiple video/audio/subtitle tracks; each track contains multiple clips; each clip stores core information such as in/out points and material IDs.

6. Media Format Regulations

Supported media types and their container/codec specifications are listed (e.g., video: MP4/MOV/WMV with H264/H265; audio: MP3/FLAC/AAC with MP3/AAC/PCM; images: JPG/PNG/GIF). Local media must be transcoded before packaging to reduce bandwidth and storage costs. Transcoding specifications for video (H264, VBR, unchanged resolution/frame rate) and audio (AAC, 128 kbps) are provided.

Cloud transcoding further standardizes formats (e.g., video: H264, CRF 25; audio: AAC, 128 kbps; images: JPG with quality factor 0.8).

7. Template ZIP Package Design

The ZIP package includes the draft.pb protocol file and all local assets (video, image, audio). A directory diagram (omitted) illustrates the structure.

8. Production & Consumption Subsystems

Version compatibility is handled by recording each template's atomic capabilities and matching them against the capabilities reported by the requesting app. This enables selective distribution of templates based on app version support.

A material‑template relationship graph ensures that when a material is withdrawn from the asset library, affected templates can be identified and handled appropriately.

Platform differences (e.g., transition types, coordinate systems, SDK implementations) are mitigated by enforcing a unified protocol, providing compatibility layers, and using glue code to bridge gaps.

AI‑generated content (AIGC) integration is achieved via a pre‑processing stage: the PB file is transformed into a TimeLine model, processed (e.g., AI drawing), and then fed into the editor engine. This allows asynchronous services such as AI drawing, 3D camera moves, and highlight detection to be incorporated into templates.

9. Componentization

The template business is modularized using flavor‑based plugins, allowing selective integration into different hosts (e.g., main for all platforms, bcut for Must‑Cut app, bilibili for Bilibili app). Build configurations and plugin diagrams (omitted) are provided.

10. Assurance

A two‑stage review system combines automated machine inspection (using PSNR and CV algorithms) with manual checks to ensure functional, performance, and safety standards. The automated stage filters out obviously defective templates, while the manual stage validates visual quality.

A comprehensive monitoring system tracks key metrics across the template lifecycle, including production success rates and latency, distribution latency, and consumption success rates. Data is stored in offline tables (Hive) and visualized via dashboards (images omitted).

11. Development Outlook

Vertical derivatives such as game highlights, Genshin Impact events, and daily mood cards leverage the template protocol for customized experiences. Future directions include deeper cloud‑draft integration, expanded AI services, and continued optimization of the template ecosystem.

System ArchitectureProtobufcomponentizationAI integrationmedia processingprotocol designvideo templates
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.