Performance Optimization of Bilibili Membership Purchase Transaction System
Bilibili’s membership purchase system was re‑engineered by refactoring serial service calls into a responsibility‑chain with parallelism, moving weak dependencies to asynchronous queues, and implementing sharding across 256 tables, boosting peak throughput from ~600 QPS to over 4,000 TPS and eliminating latency incidents during massive promotional traffic spikes.
Bilibili launched the Membership Purchase platform in 2017, offering IP‑related goods such as figures, comics, and JK uniforms. Over time the platform expanded from pre‑sale to full‑sale, blind‑box, and crowdfunding models, and sales channels grew to include Cat Ear, QQ Mini‑Program, comics, etc. Large promotional events (e.g., New Year Festival, 626 anniversary, 919 anniversary) generate traffic spikes of several hundred times the normal load, posing a serious challenge to the transaction system.
Performance challenges : During peak events the order‑placement interface experiences severe latency (up to 400 ms) and limited QPS, leading to occasional incidents.
2.1 Call‑chain optimization : The original order flow consisted of many serial, duplicated service calls, resulting in high latency. The team refactored the logic using a responsibility‑chain pattern, introduced parallel calls for independent services (product, shop, activity, user info), eliminated redundant calls, set reasonable timeouts (e.g., 200 ms) with connection retries, removed external calls inside transactions, and moved weak‑dependency calls to MQ or asynchronous execution. The optimized call graph reduced average latency by ~100 ms (from ~300 ms to ~200 ms).
2.2 Asynchronous order optimization : For high‑inventory flash‑sale scenarios (e.g., 5,000‑unit figure sales), the system faced QPS bottlenecks (~600 QPS) and severe DB lock contention. The solution adopted a queue‑based “peak‑shaving” approach: order requests are validated, an order ID is generated, and the request is placed onto a Databus MQ. Consumers batch‑pull up to 20 messages, merge them, and write results to MySQL and Redis. Users see a “order in progress” UI and the client polls the order‑status API for up to 30 seconds. This design dramatically improved throughput, supporting >4,000 TPS in load tests.
2.3 Database sharding (分库分表) : As order data grew rapidly (doubling every six months, reaching tens of millions per core table), single‑table performance degraded, master‑slave replication lag increased, and DDL operations became risky. The team evaluated sharding solutions (TDDL, DRDS, Sharding‑Sphere, MyCAT, Atlas, Zebra) and chose a CLIENT‑mode sharding‑jdbc implementation. They selected mid (member ID) and order_id as sharding keys, using a combination of range and hash strategies: 4 clusters × 4 databases per cluster × 16 tables per database = 256 tables. Routing formulas: db_index = mid % 16, table_index = (mid % 512) / 32. Migration steps included archiving old data, gradually cutting over read/write traffic, and binlog‑based back‑write to the legacy system.
Result : After applying call‑chain refactoring, asynchronous order processing, and sharding, the transaction system sustained >4,000 TPS during peak promotional events without incidents, demonstrating a robust solution for high‑concurrency e‑commerce workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
