How Bilibili Scaled Its Membership Purchase System: Call‑Chain Refactor, Async Ordering, and Sharding
This article details how Bilibili’s membership‑purchase platform tackled massive traffic spikes by redesigning the order call chain, introducing concurrent and asynchronous processing, and applying a sharding strategy that split databases and tables, ultimately boosting latency performance and supporting over 4,000 TPS during peak sales.
1. Background
Bilibili launched the "会员购" platform in 2017, selling figures, comics, and apparel. As the business grew, the system evolved from simple pre‑sale to full‑sale, blind boxes, and crowdfunding, spreading across multiple channels (Maoer, QQ mini‑program, comics). Large‑scale promotional events (e.g., New Year Festival, 626 anniversary, 919 anniversary) generated traffic spikes of several hundred times the normal load, exposing serious performance bottlenecks.
2. Performance Challenges
During early releases, the order‑creation API suffered from long wait times (400 ms+), limited QPS, and a serial call chain that repeatedly invoked the same services. The initial call‑graph (Figure 2‑1) showed many redundant, sequential requests, turning the order flow into an I/O‑bound operation with low CPU utilization.
2.1 Call‑Chain Refactor
Problem: Serial, duplicated service calls caused >400 ms latency.
Solution:
Apply a responsibility‑chain pattern to restructure the order workflow (Figure 2‑2).
Execute independent services (product, shop, activity, user info) concurrently.
Eliminate redundant calls by consolidating downstream interfaces so each base service is invoked only once per request (Figure 2‑3).
Set reasonable timeouts (e.g., 200 ms) and enable connection retries with a 100 % increase for the 99th‑percentile.
Remove external calls from within transactions (e.g., MQ, cache).
Convert weak‑dependency calls to asynchronous MQ or background tasks (e.g., follow‑shop, cache phone, rollback coupons).
Result: Interface latency dropped from ~300 ms to ~200 ms (Figure 2‑4), and user‑perceived order speed improved significantly.
2.2 Asynchronous Order Processing
Problem: Large‑inventory flash‑sale (≈5,000 items) hit a QPS ceiling of ~600 TPS; inventory service lock contention caused latency spikes and high DB connection usage.
Approach: Treat traffic like rush‑hour traffic—smooth it via queuing. Use a message queue to decouple the front‑end from the order service, allowing the system to “shave peaks”. The async flow (Figures 2‑6 – 2‑10) works as follows:
After validation, generate an order ID and push a message to the databus queue.
Consumers pull messages in batches (max 20 per batch) and merge orders.
During consumption, inventory is frozen in bulk, coupons are frozen in parallel, and a bulk INSERT writes orders to MySQL.
The front‑end shows a “order in progress” UI and polls the order‑status API for up to 30 seconds (with a hard timeout).
Failure handling includes fail‑fast on timeout, per‑order retry on stock‑deduction failure, and fallback to Redis for status queries (fallback to DB if Redis fails).
Databus consumer implements idempotence and discards messages older than a configurable threshold.
Outcome: Load testing demonstrated support for >4,000 TPS, and the async pipeline successfully handled the New Year Festival flash‑sale (Figure 2‑11).
2.3 Database Sharding (分库分表)
When tables grew to tens of millions of rows, single‑table DDL and index‑size limits caused severe latency, backup bottlenecks, and master‑slave replication lag. The team evaluated several open‑source sharding solutions (TDDL/DRDS, Sharding‑Sphere, MyCAT, Atlas, Zebra) and chose a CLIENT‑mode middleware for its simplicity and low overhead.
Key steps:
Identify a high‑traffic sharding key (mid – user ID) based on API traffic analysis.
Design a range‑based and hash‑based split strategy: mid % 16 for database routing, (mid % 512) / 32 for table routing, yielding 4 clusters × 4 databases × 16 tables = 256 tables.
Keep legacy data in the old schema (no data cleaning) to reduce migration effort.
Audit and rewrite SQL that lacks the sharding key using Druid monitoring, static analysis tools, DBA extraction, and manual code review.
Perform a zero‑downtime migration: archive historical data, gradually route read/write traffic to the new shards, and use binlog listeners to sync back to the old system for verification.
After sharding, the system eliminated single‑table bottlenecks, reduced lock contention, and maintained stability under repeated promotional spikes.
3. Summary
Through systematic call‑chain optimization, asynchronous order processing, and a carefully planned sharding strategy, the transaction system achieved dramatic latency reductions, scaled to >4,000 TPS, and reliably supported multiple high‑traffic promotional events. Future work includes applying similar refactors to legacy subsystems such as the ticketing platform.
Tags: backend, performance‑optimization, async‑processing, sharding, high‑traffic, microservices, database‑scaling
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
