Operations 3 min read

BCM – Building and Deploying Bilibili’s Chaos Engineering Platform

At the 2024 GOPS Global Operations Conference, Bilibili senior R&D engineer Gu Lintao will present BCM—Bilibili’s Chaos Engineering Platform—showcasing how its design and capabilities let developers, testers, and SREs safely inject faults, uncover hidden architectural risks, and improve service stability through real‑world drills and systematic reliability engineering.

Bilibili Tech
Bilibili Tech
Bilibili Tech
BCM – Building and Deploying Bilibili’s Chaos Engineering Platform

The 22nd GOPS Global Operations Conference & XOps Technology Innovation Summit will be held on April 25‑26, 2024 at the Renaissance Shenzhen Bay Hotel in Nanshan District, Shenzhen. The two‑day event focuses on large‑model AI, AIOps, DevOps, observability, SRE, cloud‑native and other hot technologies, and features special tracks such as large‑model + development testing, large‑model + operations, banking/securities digital transformation, automotive & manufacturing, cloud‑native & databases, DevOps/AIOps best practices, observability technology, and internet‑giant case studies.

On April 26, Bilibili senior R&D engineer Gu Lintao will deliver a presentation titled “BCM – Building and Landing Bilibili’s Chaos Platform Capabilities”. The talk will share the latest practice of the Bilibili Chaos Engineering Platform (BCM).

The session explains how BCM, through its technical capabilities and product design, supports roles such as developers, testers, and SREs, enabling safe fault injection to discover architectural problems (e.g., unreasonable dependencies, traffic‑splitting anomalies). It details scenario design that addresses Bilibili’s real business pain points, validates the rationality of application architecture, and uncovers hidden risks. BCM has already been used in daily and online real‑world drills, helping development and testing teams identify numerous unexpected online issues, block and fix code problems early, and significantly improve service stability. The presentation will cover overall drill planning, platform architecture, and how Bilibili confronts various stability challenges by designing, implementing, and operating the chaos engineering platform, thereby empowering technical staff to independently enhance reliability.

The announcement also includes images of the full conference schedule.

DevOpsChaos EngineeringSREreliabilityplatform designBilibili
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.