Building a Backend to Handle 100 Billion Red‑Packet Requests: 1M‑User Simulation
This article details a practical backend design and stress‑test that simulates 1 million concurrent users handling 100 billion WeChat red‑packet requests, covering load calculations, Go‑based implementation, QPS targets, monitoring, and performance analysis.
Preface
Reading the article "How to Build a ‘Confident’ Spring Festival Red‑Packet System" sparked many insights; although it was published in 2015, its ideas remain valuable for backend design.
Background Knowledge
QPS: Queries per second, the number of requests per second. PPS: Packets per second, the number of data packets per second. Shake red‑packet: A client sends a request; if a red‑packet is available, the user receives it. Send red‑packet: The system creates a red‑packet with a certain amount, assigns it to several users, and those users can claim portions of the amount.
Define Goals
Target: a single machine supports 1 million connections, simulating shake and send red‑packet processes, with a peak QPS of 60 k per machine.
User count: With 638 servers handling 1.43 billion users, each server supports roughly 2.28 million users; realistic concurrent users during the 2015 Spring Festival were under 5.4 hundred million.
Server count: Assuming 600 access servers are active.
Per‑machine load: 5.4 hundred million users / 600 ≈ 900 k users per server.
Peak per‑machine QPS: 14 million total QPS / 600 ≈ 23 k, aiming for 30 k–60 k QPS.
Red‑packet issuance: System should handle 50 k issuance per second, i.e., about 83 per second per machine; the target is raised to 200 per second.
Technical Analysis and Implementation
Single‑Machine 1 M Connections
Modern servers can support a million concurrent connections; the author previously built such a prototype (see GitHub repository).
30 k QPS
With 1 M connections, each connection must send a shake request every 33 seconds. Clients synchronize time via NTP and use a simple algorithm: divide users into groups based on QPS, and each client sends a request when time() % groupCount == userId % groupCount.
Server QPS Monitoring
Server records per‑second request counts via counters and monitors network traffic using a Python script combined with ethtool. Screenshots of the monitoring tool are included.
Shake Red‑Packet Business
The server produces red‑packets at a fixed rate; when a client requests, the server returns a packet if available, otherwise indicates failure. Lock contention is reduced by placing users in separate buckets.
Send Red‑Packet Business
The system randomly creates red‑packets and assigns them to a few users; those users send claim requests, and the server distributes the amount.
Monitoring
A lightweight monitoring module aggregates client counters and logs, similar to production systems that use time‑series databases like OpenTSDB.
Code Implementation and Analysis
Key points: controlling the number of Go goroutines (one per connection instead of many), partitioning connections into multiple SETs to reduce lock contention, and using a single goroutine per SET for message handling. The design processes three message types: shake requests, other client messages (e.g., chat), and server responses.
Practice
The experiment proceeds in three phases:
Phase 1 : Start server and monitoring, then launch 17 client VMs (each with 4 cores, 5 GB RAM) establishing 1 M connections (≈60 k per VM). Verify connections with ss command.
Alias ss2='ss -ant | grep 1025 | grep EST | awk -F: "{print $8}" | sort | uniq -c'Phase 2 : Adjust client QPS to 30 k via HTTP API, start a red‑packet generator at 200 packets/s (total 40 k), and observe stable QPS and red‑packet receipt.
Phase 3 : Increase client QPS to 60 k, repeat the generation and claim process, and note increased network jitter and QPS fluctuations.
Data Analysis
Client QPS chart shows three intervals (connection establishment, 30 k QPS, 60 k QPS). Server QPS chart mirrors the client behavior, with a noticeable dip around 22:57 due to code inefficiencies.
Combined chart confirms alignment between client and server loads.
Red‑packet generation chart shows stable production.
Shake result chart indicates ~200 successful shakes per second at 30 k QPS; at 60 k QPS the number fluctuates due to network instability.
pprof output shows GC pauses >10 ms, acceptable given the 7‑year‑old hardware.
Conclusion
The prototype meets the design goal: a single machine supports 1 M users and at least 30 k–60 k QPS, successfully simulating WeChat’s shake and send red‑packet processes. With 600 machines each handling 60 k QPS, the system could process 100 billion requests in about 7 minutes. Differences from a production system (e.g., payment integration, higher hardware specs) are noted.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
