How to Build a 10‑Billion‑Request Red‑Envelope System: Design, Implementation, and Lessons Learned
This article walks through the design and practical implementation of a high‑throughput red‑envelope service capable of handling 10 billion requests, covering target load calculations, hardware and software choices, client‑server coordination, performance testing, monitoring, and key takeaways for building scalable backend systems.
Background and Goal
The Spring Festival Gala featured a red‑envelope (hongbao) game, prompting the question of how to design a backend that can handle on the order of 10 billion shake‑red‑envelope requests. The author, a WeChat backend engineer, uses this scenario as a case study for high‑load system design.
Target Load Determination
Public data suggests a maximum of 14.3 billion concurrent users across 638 servers, giving roughly 2.28 million users per server. Assuming 600 active servers, each server must support about 90 k concurrent users. The derived QPS targets are:
Overall peak QPS: ~14 million
Per‑server peak QPS: 14 million / 600 ≈ 23 k
Design safety targets: 30 k QPS (baseline) and 60 k QPS (stress)
Business‑level requirements include 83 red‑envelopes per second for distribution and 200 red‑envelopes per second for sending.
Hardware and Software Stack
Software : Go 1.8r3, Shell, Python (prototype written in Go for rapid development)
Server OS : Ubuntu 12.04
Client OS : Debian 5.0
Server hardware : Dell R2950, 8‑core, 16 GB RAM (non‑exclusive use)
Client environment : 17 ESXi 5.0 VMs, each with 4 cores and 5 GB RAM, establishing 60 k connections per VM to simulate 1 million clients
System Architecture
The system is divided into multiple independent SET groups. Each SET manages a few thousand TCP connections. A single goroutine per connection reads inbound messages and forwards them to the SET’s receive queue. Inside each SET a dedicated goroutine processes three message types:
Shake‑red‑envelope request from a client
Other client messages (e.g., chat)
Server responses back to the client
Shake requests are serviced by checking a per‑SET red‑envelope queue; if a red envelope is available it is returned, otherwise a failure response is sent. A separate red‑envelope generation service continuously inserts envelopes into each SET’s queue at a fixed rate.
Client‑Side Coordination
All clients synchronize their clocks via NTP. With 1 million simulated users and a target of 50 k QPS, each client computes a group size of groupSize = totalUsers / targetQPS = 20. A client sends a request when time() % groupSize == userID % groupSize. This deterministic modulo algorithm evenly spreads the request load without a central coordinator.
Server‑Side Metrics and Monitoring
Two monitoring components are embedded in the prototype:
In‑process counters that record the number of requests processed per second.
A Python script that wraps ethtool to capture network packet rates (PPS) for the server NIC.
Both metrics are pushed to a lightweight monitoring module and visualized with simple plotting tools.
Performance Testing Procedure
Start the server and monitoring services, then launch the 17 client VMs to establish 1 million TCP connections. Verify connection counts with:
ss -ant | grep 1025 | grep EST | awk -F: '{print $8}' | sort | uniq -cAdjust the client QPS to 30 k via an HTTP control endpoint and observe server‑side request counters and network PPS.
Increase client QPS to 60 k and repeat the observations.
During the test the red‑envelope generator emits 200 envelopes per second (total 40 k) and a separate sending service creates 20 k envelopes per second, each assigned to three random users.
Results and Analysis
Client‑side QPS graphs show a stable 30 k QPS phase and a more volatile 60 k QPS phase. Server‑side QPS mirrors this behavior, with a noticeable dip around minute 22 that indicates a need for code optimisation.
Key observations:
Goroutine scheduling in Go can introduce timing drift under heavy load.
Network latency and packet loss affect QPS stability.
CPU and memory usage remained within acceptable bounds despite the 7‑year‑old hardware.
The prototype satisfied the design goal of supporting 1 million concurrent connections with up to 60 k QPS per server, implying that a fleet of 600 servers could process 10 billion shake requests in roughly 7 minutes.
Limitations and Differences from Production
The prototype omits payment processing, runs on non‑exclusive hardware, and uses older servers, all of which differ from a production environment.
References
https://github.com/xiaojiaqi/C1000kPracticeGuide https://github.com/xiaojiaqi/C1000kPracticeGuide/tree/master/docs/cn https://github.com/xiaojiaqi/fakewechat https://github.com/xiaojiaqi/10billionhongbaosSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
