How WeChat Scaled Red Packets for Billions of Transactions: Architecture & Strategies
This article explains how WeChat redesigned its red‑packet system for the 2016 Chinese New Year, detailing the dual‑data‑center architecture, order and user data separation, caching layers, traffic routing, high‑concurrency controls, database sharding, and graceful degradation to handle tens of billions of requests per minute.
Background
Compared with traditional red envelopes, the modern digital red packet has become a major highlight of Chinese New Year. In recent years, the volume of red‑packet transactions on WeChat and Alipay reached billions, with peak rates of millions per minute.
Architecture Overview
WeChat users connect via two data centers in Shenzhen (South) and Shanghai (North). The system separates order data (transactional) from user data (display). The architecture includes several key aspects:
South‑North Distribution
Order layer is independent between South and North, data is not synchronized.
When a user sends a red packet, the order is tagged with a South/North identifier and routed accordingly.
Four possible scenarios:
This design distributes traffic across the two regions, reducing system risk.
User Data Handling
User data is written asynchronously: all writes go to Shenzhen via a message queue, while reads are infrequent and can tolerate slight latency. Periodic reconciliation ensures data consistency.
Flexible Traffic Control
After the South‑North split, the system can route all traffic to a single region (e.g., Shenzhen) to relieve load on the other, providing rapid capacity adjustment and improved disaster recovery.
Database Fault Tolerance
If a database fails, traffic can be shifted to the other region, ensuring continuous service.
Pre‑Order Cache
Before payment, orders are cached and an atomic increment operation generates the red‑packet order ID, reducing unnecessary database writes for orders that never complete payment.
Asynchronous Settlement of Opened Red Packets
The information flow and money flow are separated. When a red packet is opened, a voucher is recorded in the database, and settlement is performed asynchronously via a queue. Failures are compensated later, guaranteeing eventual consistency.
Dual‑Layer Cache for Other Operations
All queries are cached with a two‑level cache: a distributed cache (ckv) and an in‑process memory cache.
If the memory cache misses, the system falls back to the database.
Database writes are synchronized to the caches; any failures are compensated by asynchronous queues and periodic reconciliation.
High Concurrency Handling
The main challenge is many users grabbing the same red packet simultaneously, causing MySQL row‑lock contention. Strategies include:
Routing requests by red‑packet order hash to a specific logical machine (sticky routing).
In‑process memory cache with atomic counters to limit concurrent DB accesses per red packet.
Multi‑level traffic control for sending, grabbing, and opening red packets.
Database simplification: order tables store only key fields; non‑essential fields (avatars, nicknames, messages) reside in cache.
Sharding by order hash and by date (hot‑cold separation), allowing hot data to stay in recent tables and older data to move to cheaper cold storage.
Red Packet Allocation Algorithm
The algorithm ensures fairness:
If only one packet remains, the full amount is given.
For each round, a lower bound (minimum amount) and an upper bound (maximum amount) are calculated based on remaining money and packets.
The upper bound is capped at twice the average remaining amount to avoid large disparities.
A random number modulo the upper bound determines the actual amount; if it falls below the lower bound, the lower bound is used.
Graceful Degradation Strategies
Order cache failure falls back to direct DB writes with an ID generator.
Grab‑time cache failure falls back to DB queries with rate‑limiting.
When cache fails, non‑critical user profile data is fetched from a real‑time interface or replaced with defaults.
Settlement can be performed synchronously for large amounts or asynchronously via a queue for small amounts.
User list queries are limited to two pages when under pressure; deeper pages fall back to DB.
Result
The redesigned system successfully handled the 2016 Spring Festival peak, providing a smooth experience for millions of users.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
