Simulating 10 Billion Red Packet Requests: A Go‑Based High‑QPS Backend Blueprint

This article details a step‑by‑step engineering effort to model and benchmark a high‑throughput red‑packet service capable of handling 10 billion requests, covering target metrics, hardware setup, Go implementation, load generation, monitoring, and performance analysis.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Simulating 10 Billion Red Packet Requests: A Go‑Based High‑QPS Backend Blueprint

Introduction

This work reproduces a large‑scale “shake‑red‑packet” benchmark originally described in a 2015 article. The goal is to simulate 10 billion shake‑red‑packet requests in a local environment, validate the feasibility of the load, and extract backend design lessons.

Background

Key concepts:

QPS – queries (requests) per second.

Shake‑red‑packet – a client request for a random red packet; the server returns a packet if one is available.

Send‑red‑packet – creation of a fixed‑amount red packet assigned to a set of users; users later claim portions of the amount.

Objectives

The prototype targets the following load characteristics derived from public 2016 data:

Total active users ≈ 540 million; with 600 servers this yields ~5.4 million users per server.

Peak per‑server QPS ≈ 23 k (stress‑test target up to 60 k).

Shake‑red‑packet request rate ≈ 83 requests / s per server.

Send‑red‑packet rate ≈ 200 packets / s per server.

Software & Hardware

Software

Go 1.8r3, shell scripts, Python for auxiliary tools. Server OS: Ubuntu 12.04. Client OS: Debian 5.0.

Hardware

Server: Dell R2950, 8‑core CPU, 16 GB RAM (non‑dedicated). Client side: 17 ESXi 5.0 VMs, each with 4 cores and 5 GB RAM, establishing 60 k TCP connections per VM to simulate 1 million concurrent clients.

Server hardware
Server hardware
CPU details
CPU details

Technical Implementation

1. One‑million connections on a single machine

Go’s goroutine model allows a single host to maintain one million TCP connections. The implementation is available at:

https://github.com/xiaojiaqi/C1000kPracticeGuide

2. Achieving 30 k QPS

Client side: All client VMs synchronize clocks via NTP. Each client decides whether to send a request in the current second using the rule time() % 20 == user_id % 20, which partitions 1 million users into 20 groups. This yields roughly 30 k requests per second.

Server side: The server increments a per‑second counter for processed requests. Network traffic is monitored with a Python wrapper around ethtool. The monitoring script logs packets per second and displays them in a simple UI.

Network monitoring
Network monitoring

3. Shake‑red‑packet logic

The server continuously generates red packets at a fixed rate and stores them in per‑user buckets. When a client request arrives, the server checks the bucket; if a packet exists it is returned, otherwise a failure response is sent. Bucket partitioning reduces lock contention. For higher throughput a Disruptor‑style ring buffer could replace the simple queue.

4. Send‑red‑packet logic

Red packets are created with random amounts and assigned to a small set of users. Clients request to claim a portion of the amount. The prototype omits payment processing and encryption.

5. Monitoring

Monitoring reuses code from another project (https://github.com/xiaojiaqi/fakewechat). Each client and the server periodically push their counters to a central collector, which aggregates and visualises the data. Logs are persisted for offline analysis.

Monitoring UI
Monitoring UI

6. Code architecture

Connections are grouped into multiple independent SET objects. Each SET manages a few thousand connections and owns a single goroutine for reading from those connections. This reduces the total goroutine count to roughly the number of connections plus a small overhead.

Inside a SET, a worker goroutine processes three message types:

Shake‑red‑packet request.

Other client messages (e.g., chat).

Server responses.

The red‑packet generator pushes packets into each SET’s queue at a steady pace, ensuring fairness across SETs.

Architecture diagram
Architecture diagram

Full source code is hosted at:

https://github.com/xiaojiaqi/10billionhongbaos

Experimental Procedure

Phase 1 – Connection establishment

Start the server and monitoring service, then launch the 17 client VMs to create 1 million TCP connections. Verify connection counts with:

Alias ss2='ss -ant | grep 1025 | grep EST | awk -F: "{print $8}" | sort | uniq -c'
ss command output
ss command output

Phase 2 – 30 k QPS

Set client QPS to 30 k via an HTTP control interface, start a red‑packet generator at 200 packets / s, and observe that clients receive roughly 200 packets per second.

30k QPS monitoring
30k QPS monitoring

Phase 3 – 60 k QPS

Increase client QPS to 60 k, repeat the packet generation, and confirm the system continues to process the load, albeit with higher variance.

60k QPS monitoring
60k QPS monitoring

Data Analysis

Client‑side and server‑side QPS were recorded and plotted using Python and gnuplot. Three distinct regions appear: baseline, 30 k QPS, and 60 k QPS. Fluctuations are attributed to goroutine scheduling, network latency, and occasional packet loss.

Client QPS graph
Client QPS graph
Server QPS graph
Server QPS graph

Additional charts show red‑packet generation rates, per‑second client acquisition, and a Go pprof snapshot confirming acceptable GC pauses on the legacy hardware.

Combined QPS graph
Combined QPS graph
Red‑packet generation
Red‑packet generation
Client shake‑red‑packet stats
Client shake‑red‑packet stats
Go pprof snapshot
Go pprof snapshot

Conclusion

The prototype demonstrates that a single server can sustain 1 million concurrent connections and up to 60 k QPS, meeting the target of processing 10 billion shake‑red‑packet requests in roughly 7 minutes when 600 such servers are deployed. Differences from a production system include the absence of protobuf encryption, payment integration, sophisticated logging, and advanced monitoring, but the core scalability principles are validated.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBackend ArchitectureGoPerformance Testinghigh QPSLoad Simulation
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.