Cloud Native 27 min read

Evolution and Design of Bilibili's Load‑Testing Platform (Platform 2.0)

Bilibili’s load‑testing platform evolved from ad‑hoc JMeter scripts to a fully automated, self‑service system (Platform 2.0) that uses a custom load client, adaptive scheduling, and flexible scenario modes—including traffic replay and data‑isolated testing—to efficiently stress‑test over a hundred microservices for large‑scale events, with further integration and circuit‑breaker enhancements planned.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Evolution and Design of Bilibili's Load‑Testing Platform (Platform 2.0)

Author

Xu Guangyao – Senior Test Development Engineer at Bilibili, responsible for the design and development of engineering‑efficiency platforms, including load testing, CI pipelines, and cloud‑native solutions.

01 Background

Load testing is essential for proactively discovering performance bottlenecks before they appear in production. Bilibili conducts load tests for new applications, existing services during feature iteration, and large‑scale events (e.g., S‑series competitions, New Year celebrations, cross‑night streams). This article focuses on the evolution of Bilibili’s load‑testing platform.

02 Evolution of Bilibili Load‑Testing Platform

The platform has gone through three stages: manual stage → Platform 1.0 → Platform 2.0.

Manual stage : Before 2018, each team wrote its own scripts and used open‑source tools such as JMeter. Distributed load generation required high expertise, leading to high human and resource costs.

Platform 1.0 : Launched in 2018, it converted user‑defined interface configurations into JMeter scripts and ran each script in a pod. However, several bottlenecks emerged:

JMeter required manual concurrency settings, causing resource waste or overload.

JMeter did not support distributed file parameter passing without duplication.

Disk I/O bottlenecks when parameter files were small.

XML‑based scenario templates were inflexible.

Platform 2.0 : In early 2021, the architecture was rebuilt. A custom load‑generation client replaced JMeter, and an adaptive scheduling algorithm based on monitoring data enabled an intelligent, self‑adjusting load engine.

03 Load‑Testing Platform Design

The goal is a self‑service platform with two key aspects:

Simplified user operations – users can complete the entire test without platform staff intervention.

Rich scenario support – the platform accommodates diverse testing needs while minimizing external steps.

3.1 Load‑Testing Engine Implementation

The engine accepts a desired pressure value from the user and automatically handles distributed node concurrency, abstracting away low‑level details.

3.1.1 Design Considerations

Two critical points for raising the pressure ceiling:

Distributed deployment of the load client with scalable node count.

Full utilization of each node’s resources.

Each pod is allocated a fixed 4C8G resource slice, running a single load client container. This simplifies management and enables easy scaling by expanding the Kubernetes resource pool.

3.1.2 New Framework

The new framework introduces:

A custom load client with dynamic thread‑pool capabilities.

An adaptive scheduler that monitors instance metrics and automatically adjusts instance count to achieve optimal concurrency.

3.1.3 Scheduling Principle

The scheduler reacts to two factors: client status changes and user actions. Clients have three states – idle, scaling, and stable – and transition based on load thresholds and target QPS. Users can start tasks (pushing jobs to Redis) or trigger acceleration (adding parallel instances).

3.1.4 Load Modes

Two user‑selectable modes:

QPS‑target mode – users adjust expected QPS and ramp‑up speed; the engine translates these into concurrency changes.

Concurrency‑target mode – users directly set desired concurrency.

Most scenarios (>80%) benefit from QPS‑target mode, while burst‑traffic cases (e.g., flash sales) may prefer concurrency‑target mode.

3.2 Scenario Modes

Scenarios are classified into custom scenes and traffic‑record‑and‑replay scenes.

3.2.1 Custom Scenes

Supports diverse request parameterization:

File‑based parameters (one line per set).

Random functions (date, number, string).

Signature functions (MD5, Bilibili‑specific).

Example request:

http://api.bilibili.com/test?name=${name}&type=${type}&value=${__RANDOMNUM,1,12}

Parameters can be injected via files (name, type) or random functions (value). The platform also offers three forwarding options (external, internal SLB, direct service discovery) and assertion mechanisms for response validation.

Interface composition supports serial and parallel sub‑tasks, with upstream responses feeding downstream parameters.

File preprocessing distinguishes between “repeat” and “non‑repeat” modes to balance load distribution and I/O efficiency.

3.2.2 Traffic Recording & Replay

A non‑intrusive, language‑agnostic recording service runs alongside each microservice pod, capturing inbound and outbound traffic via tshark, encrypting sensitive fields, and forwarding the data to Kafka → Logstash → Elasticsearch. Users configure recording rules (paths, sample rates, instance count) via a management portal. Recorded traffic can be exported as execution files compatible with the custom scene engine for replay testing.

04 Full‑Link Data Isolation

To protect production data, two isolation strategies are used:

Account isolation – dedicated test accounts for all write operations.

Shadow‑database isolation – requests carrying a special header are routed to shadow databases via framework hooks.

Most tests (>80%) rely on account isolation; the remaining cases combine traffic replay with shadow‑DB isolation.

05 Load‑Testing Practice Results

Over 95% of Bilibili’s services use the platform, especially during large events. Usage spikes are visible in 2021 Q4, coinciding with major competitions and promotions. Platform 2.0 enabled >1,000 tests across 130+ microservices for the S11 event, supporting both custom and concurrency‑target scenarios.

06 Future Plans

Planned enhancements include:

Integrating platform metrics with the online monitoring system for unified dashboards.

Extending the circuit‑breaker logic to consider server‑side metrics (e.g., CPU > 70%).

distributed systemscloud-nativeMicroservicesload testingperformance engineering
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.