Backend Development 24 min read

Platformization of Bilibili's Live‑Streaming E‑Commerce Business: Architecture, Implementation and Governance

Bilibili transformed its fast‑growing live‑streaming e‑commerce operation by constructing a modular platform that separates product, user, and application layers, introduces a unified product middle‑platform, standardized capabilities, real‑time attribute handling, and robust monitoring and governance, thereby reducing technical debt, improving stability, and preparing for hundred‑billion‑level GMV scaling.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Platformization of Bilibili's Live‑Streaming E‑Commerce Business: Architecture, Implementation and Governance

Background

Live‑streaming e‑commerce ("带货") has become a fast‑growing business at Bilibili. From an initial coupling with mandatory advertising, the business grew rapidly in just over a year, leading to a surge in the number of live‑streaming creators (UPs), revenue, and platform income. The technical side faced severe challenges: there was no independent system, and a stable, efficient platform had to be built from scratch to support rapid iteration.

Current Situation and Problems

The business involves many complex scenarios (video, article, live‑stream, etc.) and requires conversion from content to purchase. Early systems could not support this. Core issues included unclear domain boundaries, ad‑hoc demand‑driven development, heavy coupling, and lack of overall design, which resulted in low development efficiency, poor delivery quality, and high technical debt.

Platform Architecture

The platform was designed to answer four key questions: What are the business scenarios? Who does the platform serve? How are platform capabilities aggregated? Where are the boundaries? A high‑level architecture diagram (omitted) shows a systematic, modular structure that separates the product side, the user side, and the application side.

Key Architectural Components

Where does the product come from? – Product middle‑platform.

How is the product delivered? – Platform capabilities.

How is the product managed? – Operations platform.

How is the product released? – Joint product/advertising engine.

How do people, products and scenes interact? – Standardized sales chain (person‑side, product‑side, data‑side, algorithm).

Design covers system layering, domain modeling, and standardized interaction protocols.

Product Middle‑Platform

The product domain is the core of the "goods" side. Early systems lacked a product concept and tightly coupled every new product type or channel to the core flow, causing high risk and long cycles. A new middle‑platform was built from 0‑1 to provide a unified product‑supply service, focusing on channel complexity rather than traditional inventory management.

Challenges addressed:

Fast, efficient channel onboarding.

Abstract product domain model.

Zero‑downtime migration and data consistency.

Channel Solution

A dynamic configuration approach abstracts each channel as a supply source, passing data through a validator, converter, and storage component. The architecture (omitted) shows a channel factory, channelConfig, validator, convertor, and entity model.

Model and Storage

The product entity model was refactored from a simple Elasticsearch store to a domain‑driven design with core, extension, application, and shelf tables stored in MySQL. A binlog‑based indexing pipeline feeds a separate search index, providing read‑write separation. Consistency mechanisms (retry, dead‑letter queue, periodic reconciliation) ensure strong‑read scenarios use MySQL directly while search‑oriented scenarios use the index.

Platform Capabilities

Core capabilities address "how to bring the product" and include natural sales, flow‑driven sales, and live‑stream sales. Early services were monolithic and chaotic; the new design aggregates capabilities into a layered architecture (client / service / common / starter) and defines unified permission, flow, settlement, PID, and data models.

Interaction Model

The interaction model decouples content (articles, comments, videos) from sales attributes. Two solutions were considered: a real‑time attribute API (high pressure) or a real‑time attribute table (DB‑based). The latter was chosen, enabling asynchronous, low‑impact updates and providing a unified attribute source for downstream services.

Platform Governance

Stability and health monitoring were lacking, with many noisy alerts. A new monitoring system based on Prometheus and Grafana was introduced. A custom SDK collects uniform logs via AOP; metrics such as RT, QPS, error rate, and call volume are visualized. Alert rules are defined per service tier (C‑side, B‑side, read/write) using thresholds on RT, error count, and NPE count.

Storage Stability

Historical tables (≈1 billion rows) were shared with advertising services, causing timeouts and deadlocks. The solution involved decoupling databases, caching advertisement data, migrating core tables to a dedicated database, and archiving old data. This reduced table size by >50 % and improved query performance dramatically.

Service Stability

Performance bottlenecks were identified in slow DB queries and cache saturation. Actions taken:

Added appropriate composite indexes and refactored multi‑table joins.

Reduced large keys and set sensible TTLs, then migrated to a new cache cluster.

Result: TP95 latency dropped significantly, and error rates fell to near zero.

Decision Balancing

Rapid business growth required simultaneous delivery of new features and remediation of technical debt. A multi‑track strategy was adopted: business‑driven milestones were prioritized, while technical improvements (monitoring, middle‑platform, capability consolidation, governance) proceeded in parallel, often embedded within ongoing feature projects.

Future Evolution

Having completed the 0‑to‑1 phase, the next steps focus on scaling to hundred‑billion‑level GMV, strengthening gateways, data centers, unified placement capabilities, expanding the product middle‑platform to tens of millions of SKUs, evolving domain models, and further improving high‑availability, performance, and extensibility.

monitoringSystem Architecturelive streamingPlatform EngineeringBilibilie-commerce platform
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.