How to Build a 10‑Billion‑Request Red‑Envelope System: Design, Implementation, and Lessons Learned

This article walks through the design and practical implementation of a high‑throughput red‑envelope service capable of handling 10 billion requests, covering target load calculations, hardware and software choices, client‑server coordination, performance testing, monitoring, and key takeaways for building scalable backend systems.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
How to Build a 10‑Billion‑Request Red‑Envelope System: Design, Implementation, and Lessons Learned

Background and Goal

The Spring Festival Gala featured a red‑envelope (hongbao) game, prompting the question of how to design a backend that can handle on the order of 10 billion shake‑red‑envelope requests. The author, a WeChat backend engineer, uses this scenario as a case study for high‑load system design.

Target Load Determination

Public data suggests a maximum of 14.3 billion concurrent users across 638 servers, giving roughly 2.28 million users per server. Assuming 600 active servers, each server must support about 90 k concurrent users. The derived QPS targets are:

Overall peak QPS: ~14 million

Per‑server peak QPS: 14 million / 600 ≈ 23 k

Design safety targets: 30 k QPS (baseline) and 60 k QPS (stress)

Business‑level requirements include 83 red‑envelopes per second for distribution and 200 red‑envelopes per second for sending.

Hardware and Software Stack

Software : Go 1.8r3, Shell, Python (prototype written in Go for rapid development)

Server OS : Ubuntu 12.04

Client OS : Debian 5.0

Server hardware : Dell R2950, 8‑core, 16 GB RAM (non‑exclusive use)

Client environment : 17 ESXi 5.0 VMs, each with 4 cores and 5 GB RAM, establishing 60 k connections per VM to simulate 1 million clients

System Architecture

The system is divided into multiple independent SET groups. Each SET manages a few thousand TCP connections. A single goroutine per connection reads inbound messages and forwards them to the SET’s receive queue. Inside each SET a dedicated goroutine processes three message types:

Shake‑red‑envelope request from a client

Other client messages (e.g., chat)

Server responses back to the client

Shake requests are serviced by checking a per‑SET red‑envelope queue; if a red envelope is available it is returned, otherwise a failure response is sent. A separate red‑envelope generation service continuously inserts envelopes into each SET’s queue at a fixed rate.

Architecture diagram
Architecture diagram

Client‑Side Coordination

All clients synchronize their clocks via NTP. With 1 million simulated users and a target of 50 k QPS, each client computes a group size of groupSize = totalUsers / targetQPS = 20. A client sends a request when time() % groupSize == userID % groupSize. This deterministic modulo algorithm evenly spreads the request load without a central coordinator.

Server‑Side Metrics and Monitoring

Two monitoring components are embedded in the prototype:

In‑process counters that record the number of requests processed per second.

A Python script that wraps ethtool to capture network packet rates (PPS) for the server NIC.

Both metrics are pushed to a lightweight monitoring module and visualized with simple plotting tools.

Performance Testing Procedure

Start the server and monitoring services, then launch the 17 client VMs to establish 1 million TCP connections. Verify connection counts with:

ss -ant | grep 1025 | grep EST | awk -F: '{print $8}' | sort | uniq -c

Adjust the client QPS to 30 k via an HTTP control endpoint and observe server‑side request counters and network PPS.

Increase client QPS to 60 k and repeat the observations.

During the test the red‑envelope generator emits 200 envelopes per second (total 40 k) and a separate sending service creates 20 k envelopes per second, each assigned to three random users.

Results and Analysis

Client‑side QPS graphs show a stable 30 k QPS phase and a more volatile 60 k QPS phase. Server‑side QPS mirrors this behavior, with a noticeable dip around minute 22 that indicates a need for code optimisation.

Key observations:

Goroutine scheduling in Go can introduce timing drift under heavy load.

Network latency and packet loss affect QPS stability.

CPU and memory usage remained within acceptable bounds despite the 7‑year‑old hardware.

The prototype satisfied the design goal of supporting 1 million concurrent connections with up to 60 k QPS per server, implying that a fleet of 600 servers could process 10 billion shake requests in roughly 7 minutes.

Limitations and Differences from Production

The prototype omits payment processing, runs on non‑exclusive hardware, and uses older servers, all of which differ from a production environment.

References

https://github.com/xiaojiaqi/C1000kPracticeGuide
https://github.com/xiaojiaqi/C1000kPracticeGuide/tree/master/docs/cn
https://github.com/xiaojiaqi/fakewechat
https://github.com/xiaojiaqi/10billionhongbaos
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsGolangPerformance Testinghigh concurrencyLoad Testing
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.