Full-Link Pressure Testing Automation Practice for Bilibili's Live Streaming Gifting Business
Bilibili automated full‑link pressure testing for its high‑traffic live‑stream gifting service by adopting traffic co‑location with storage isolation, creating shadow tables, keys and topics, and building a three‑phase, three‑layer framework that analyses links, confirms configurations, and verifies end‑to‑end behavior across all services.
This article details Bilibili's practice of implementing full-link pressure testing for their live streaming gifting business, which exhibits high write operations, traffic spikes during major events, and strict real-time data requirements. The traditional pressure testing approaches could not accurately simulate production conditions due to various shielding and blacklist processing for write scenarios.
The article first compares three industry-standard full-link pressure testing approaches: traffic co-location with storage isolation and online stress testing; data marking with logical isolation and online stress testing; and mirror environment or offline testing. Bilibili chose the first approach based on their unified language stack, consistent infrastructure components, and mature service governance.
Bilibili's full-link pressure testing solution consists of three main components: traffic co-location (sharing resources with online clusters during low-traffic periods, using traffic marking to distinguish test traffic), online stress testing (through their pressure testing platform), and storage isolation (creating shadow tables for databases, shadow keys for Redis, and shadow topics for message queues).
The core challenge was testing numerous service modifications across revenue core services, underlying middleware, pressure testing SDK, console, and stress platforms. The authors designed a comprehensive automated testing solution divided into three phases: ensuring basic capabilities through testing new nodes like mirror SDK and pressure testing console; implementing full-link automation for business access and full-process verification; and building platformization and visualization for future scaling.
The automated testing solution includes three main parts: link analysis (using trace tracking and static code scanning tools like biliconfigcheck lint to ensure context propagation), configuration confirmation (configuring pass-through, mirroring, write-discard, and mock rules for interfaces, databases, caches, and message queues), and automated verification (validating interface responses, storage operations, async business flows, and link completeness).
The automation framework was redesigned with three layers: case layer for single-interface and scenario test orchestration, invoker layer for request encapsulation and assertion management, and coverage layer for test coverage statistics. Key modifications included adding a "mirror" identifier controlled by a global variable, implementing trace_toolset for link completeness checking, and adding pressure testing markers to HTTP/gRPC request headers.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.