Simulating a 10‑Billion Red‑Envelope System with Go: From 3K to 6K QPS

This article details a step‑by‑step engineering experiment that reproduces a high‑throughput "red‑envelope" service, outlining the required hardware, software stack, load‑generation logic, monitoring setup, and performance results for handling up to 6 000 QPS on a 100‑million‑user scale.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Simulating a 10‑Billion Red‑Envelope System with Go: From 3K to 6K QPS

1. Introduction

The author reproduced a prototype to evaluate whether a system can handle the massive load described in a 2015 analysis of 100 billion red‑envelope requests during a Chinese New Year gala.

2. Background

Key concepts:

QPS – queries per second.

Shake‑red‑envelope – client request for a random red envelope; the server returns an envelope if available.

Send‑red‑envelope – server‑initiated distribution of a fixed‑amount envelope to selected users.

3. Objectives

Simulate 1 million concurrent clients.

Validate that a single server can sustain at least 2.3 × 10⁴ QPS, with target peaks of 3 × 10⁴ QPS and 6 × 10⁴ QPS.

Support a shake‑red‑envelope request rate of 83 req/s and a red‑envelope distribution rate of 200 envelopes/s.

4. Hardware & Software Stack

Software : Go 1.8r3, shell scripts, Python (monitoring).

Server OS : Ubuntu 12.04.

Client OS : Debian 5.0.

Server hardware : Dell R2950, 8‑core CPU, 16 GB RAM (non‑dedicated).

Client hardware : 17 ESXi 5.0 VMs, each with 4 CPU / 5 GB RAM, establishing 60 000 TCP connections per VM to reach 1 million total connections.

5. Technical Analysis & Implementation

5.1 Single‑machine support for 1 million connections

A prior prototype already demonstrated handling one million concurrent connections on a single host. Source code is available at https://github.com/xiaojiaqi/C1000kPracticeGuide.

5.2 Achieving 30 K QPS

Client side : All client VMs synchronize clocks via NTP. Each client decides whether to send a request in the current second using the formula:

groupCount = totalUsers / targetQPS
if time()%groupCount == userId%groupCount {
    // send shake‑red‑envelope request
}

This spreads requests evenly over time without a central coordinator.

Server side : A per‑second counter records request volume. Network traffic is monitored with a Python script that wraps ethtool. The script aggregates counters from all clients and displays them in a simple UI.

5.3 Shake‑red‑envelope business

The server continuously generates envelopes and stores them in bucketed queues. When a client request arrives, the server checks the bucket for an available envelope; if found, it returns the envelope, otherwise it replies with “no envelope”. Bucketed queues reduce lock contention. For higher performance, a lock‑free structure such as the Disruptor could be substituted.

5.4 Send‑red‑envelope business

Random envelopes are generated and assigned to random users. Clients then request to claim them. The prototype omits payment processing and encryption to keep the logic simple.

5.5 Monitoring

The monitoring component reuses code from https://github.com/xiaojiaqi/fakewechat. It receives counters from every client, aggregates them, and renders simple graphs. A screenshot of the monitor UI is shown in the original article.

6. Code Architecture

Connections are partitioned into multiple independent SET groups. Each SET manages a few thousand connections, owns its own receive queue, and runs a dedicated goroutine for processing. This limits the total number of goroutines to roughly the number of connections plus a small overhead, dramatically reducing CPU and memory consumption.

Message handling inside a SET distinguishes three types:

1. Shake‑red‑envelope request from client 2. Other client messages (e.g., chat, friend requests) 3. Server responses to client

Red‑envelope generation runs in a separate service that feeds envelopes into each SET’s queue in a round‑robin fashion, ensuring fairness.

Full source code:

https://github.com/xiaojiaqi/10billionhongbaos

7. Practice Procedure

The experiment consists of three phases.

Phase 1 – Connection establishment

Start the server and the monitor, then launch the 17 client VMs to create one million TCP connections. Verify connection counts with the following alias (wrapped in backticks):

Alias ss2='ss -ant | grep 1025 | grep EST | awk -F: "{print $8}" | sort | uniq -c'

Resulting connection statistics are captured via ss.

Phase 2 – 30 K QPS test

Increase client QPS to 30 000 using an HTTP control endpoint on each client. Observe network statistics and monitor graphs confirming the target rate. Start a red‑envelope generator that emits 200 envelopes per second (total 40 000). Clients receive roughly 200 envelopes per second.

Phase 3 – 60 K QPS test

Raise client QPS to 60 000 and repeat the same steps. The system continues to deliver ~200 envelopes per second, though with higher variance due to network jitter.

8. Data Analysis

Both client‑side and server‑side counters are exported to the monitor and visualized with Python + gnuplot. Three QPS regions are evident: ~30 K, ~60 K, and the initial connection‑establishment burst. Deviations are attributed to:

Goroutine scheduling delays causing timing drift.

Network latency during connection setup.

Packet loss on the 1 Gbps link under high load.

Server‑side QPS mirrors the client pattern but shows a dip around 22:57, indicating a code‑level bottleneck that could be optimized.

Envelope generation rates remain stable at 200 envelopes/s, and the per‑second successful shake‑envelope counts align with expectations.

9. Conclusions

The prototype meets its design goals: it can simulate 1 million users and sustain at least 30 K QPS, scaling to 60 K QPS with acceptable stability. Extrapolating to a 600‑node deployment suggests that the full 10‑billion request workload could be completed in roughly 7 minutes.

Key differences from a production system include:

Simplified business logic and protocols (no protobuf, encryption, or payment integration).

Absence of sophisticated logging, security controls, and hot‑update mechanisms.

Basic monitoring instead of production‑grade time‑series databases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendDistributed SystemsmonitoringGoPerformance TestingLoad Testinghigh QPS
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.