Backend Development 15 min read

How We Engineered a Million‑User Lottery System to Survive Massive Spikes

This article details the end‑to‑end architecture, rate‑limiting strategies, caching layers, database optimizations, and hardware upgrades that enabled a lottery service to handle daily traffic exceeding one million users during peak promotional events.

Architect

Oct 1, 2024

How We Engineered a Million‑User Lottery System to Survive Massive Spikes

1. Server‑side Rate Limiting

We use an A10 hardware load balancer (commercial) instead of Nginx for simplicity, and Tomcat as the web server. Two main server‑side mitigations were applied:

CC protection : limit each IP to 200 requests per minute; excess requests are rejected. This can be configured on A10 or via Nginx connection‑limit modules.

Tomcat concurrency tuning : the default maxThreads=500 caused timeouts under heavy load. Performance testing showed degradation after 400 concurrent requests, so we reduced maxThreads to 400 to cap Tomcat processing capacity.

2. Application‑layer Rate Limiting

At the application level we added three mechanisms:

Semaphore control : a Java Semaphore with 350 permits (leaving 50 threads for rejecting excess requests) allows us to return a quick “no prize” response within ~10 ms for over‑limit traffic.

User‑behavior identification : real‑time human‑bot detection based on click patterns, IP, User‑Agent, device ID, etc. Requests lacking normal interaction are flagged and blocked. A risk‑list of known bots or scalpers is also maintained.

Additional rules : activity‑specific limits stored in cache further trim traffic.

Images illustrate the flow before and after behavior detection, showing peak traffic dropping from 600 k to 300 k requests per minute and prize exhaustion time improving dramatically.

3. Application‑layer Performance Optimization

Performance bottlenecks centered on the database. We applied:

Distributed cache (Ycache) : a Memcached‑based component stores large user‑related data to reduce DB reads.

Local cache : hot, rarely‑updated data (e.g., activity rules) cached in‑process using EhCache or a simple ConcurrentHashMap.

Optimistic locking : update statements include a version column to ensure only one winner decrements the prize count.

update award set award_num=award_num-1 where id=#{id} and version=#{version} and award_num>0

Unique index : a unique constraint on (prize_id, user_id) prevents duplicate winning records.

4. Database and Hardware

Initial load tests with 50 concurrent users yielded average response times >600 ms and peaks >1 s, exposing a database connection pool of only 30‑50 connections. Raising the pool to 100 reduced connection timeouts but did not solve the latency.

VisualVM snapshots identified heavy time spent in database write methods and an RPC call. Further investigation revealed the test server used an old mechanical HDD, causing high log file sync wait times (>60 ms). Switching to SSD reduced average response time to 136 ms at 441 concurrent users, comfortably supporting the estimated 190 k requests per minute.

5. Other Optimization Ideas (Not Implemented)

Message queue to decouple prize drawing and allow asynchronous processing.

Asynchronous RPC for the long‑running call.

Read‑write separation for databases (discarded due to consistency concerns).

Activity‑level database sharding.

In‑memory databases for ultra‑low latency.

Hardware upgrades beyond SSD.

6. Key Takeaways

High traffic spikes often hide a large proportion of bot traffic; behavior detection is essential to protect real users.

Performance optimization must consider the entire stack—from code and JVM to network and storage—because a single hardware bottleneck can nullify all software improvements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Backend Architecture load balancing caching high concurrency rate limiting database scaling

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.