Operations 3 min read

BCM – Building and Deploying Bilibili’s Chaos Engineering Platform

At the 2024 GOPS Global Operations Conference, Bilibili senior R&D engineer Gu Lintao will present BCM—Bilibili’s Chaos Engineering Platform—showcasing how its design and capabilities let developers, testers, and SREs safely inject faults, uncover hidden architectural risks, and improve service stability through real‑world drills and systematic reliability engineering.

Bilibili Tech
Bilibili Tech
Bilibili Tech
BCM – Building and Deploying Bilibili’s Chaos Engineering Platform

The 22nd GOPS Global Operations Conference & XOps Technology Innovation Summit will be held on April 25‑26, 2024 at the Renaissance Shenzhen Bay Hotel in Nanshan District, Shenzhen. The two‑day event focuses on large‑model AI, AIOps, DevOps, observability, SRE, cloud‑native and other hot technologies, and features special tracks such as large‑model + development testing, large‑model + operations, banking/securities digital transformation, automotive & manufacturing, cloud‑native & databases, DevOps/AIOps best practices, observability technology, and internet‑giant case studies.

On April 26, Bilibili senior R&D engineer Gu Lintao will deliver a presentation titled “BCM – Building and Landing Bilibili’s Chaos Platform Capabilities”. The talk will share the latest practice of the Bilibili Chaos Engineering Platform (BCM).

The session explains how BCM, through its technical capabilities and product design, supports roles such as developers, testers, and SREs, enabling safe fault injection to discover architectural problems (e.g., unreasonable dependencies, traffic‑splitting anomalies). It details scenario design that addresses Bilibili’s real business pain points, validates the rationality of application architecture, and uncovers hidden risks. BCM has already been used in daily and online real‑world drills, helping development and testing teams identify numerous unexpected online issues, block and fix code problems early, and significantly improve service stability. The presentation will cover overall drill planning, platform architecture, and how Bilibili confronts various stability challenges by designing, implementing, and operating the chaos engineering platform, thereby empowering technical staff to independently enhance reliability.

The announcement also includes images of the full conference schedule.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DevOpschaos engineeringSREReliabilityplatform designBilibili
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.