Backend Development 15 min read

How We Scaled a Lottery System to Over 1M Daily Users: Architecture & Performance Hacks

This article details the end‑to‑end architecture and step‑by‑step performance tuning of a high‑traffic lottery platform, covering server‑level rate limiting, application‑level throttling, semaphore usage, user‑behavior detection, caching strategies, database optimizations, and hardware upgrades that together enabled stable handling of millions of daily requests.

Architect

Oct 27, 2024

Overall Design Overview

The lottery feature experiences occasional traffic spikes, especially during major promotions, where daily unique visitors exceed one million. To handle such bursts, the system was refactored with two main strategies: rate‑limiting (traffic shaping) and performance optimization (throughout the stack).

1. Server‑Level Rate Limiting

We use an A10 hardware load balancer (commercial alternative to Nginx) in front of Tomcat web servers. Two key configurations were applied:

CC protection : limit each IP to 200 requests per minute; excess requests are rejected. This can be configured directly on A10 or via Nginx's connection‑limit module.

Tomcat concurrency tuning : the default maxThreads=500 caused timeouts under heavy load. After load testing we reduced it to maxThreads=400 to cap the number of concurrent requests and prevent downstream timeouts.

2. Application‑Level Rate Limiting

At the code level we introduced two mechanisms:

Semaphore control : a Java Semaphore with 350 permits (leaving 50 threads for error responses) ensures that excess requests receive a quick “no prize” response instead of hanging. This improves user experience during peak seconds.

User‑behavior identification : using real‑time data (click patterns, IP, User‑Agent, device ID) we feed requests to a risk‑assessment module. Requests lacking legitimate interaction are flagged and rejected, cutting malicious traffic roughly in half.

3. Application‑Level Performance Optimization

The main bottleneck was database pressure. We applied several tactics:

Caching :

Distributed cache (Ycache, a Memcached‑based component) stores large user‑related data.

Local cache (EhCache or a custom ConcurrentHashMap wheel) holds small, rarely‑updated data such as activity rules.

Transaction avoidance : Instead of heavyweight JDBC transactions that hold a DB connection for the entire request, we used optimistic locking (version field) and unique indexes to guarantee that only one award record is inserted per user.

UPDATE award SET award_num = award_num - 1 WHERE id = #{id} AND version = #{version} AND award_num > 0;

4. Database and Hardware

Initial load tests with 50 concurrent users showed average response times >600 ms and peaks >1000 ms, mainly due to DB connection limits (30) and a mechanical hard drive. After increasing the connection pool to 100 and swapping the test server’s disk to SSD, performance improved dramatically.

Final benchmark: 441 concurrent threads, average latency 136 ms, capable of handling ~190 k requests per minute, comfortably above the estimated peak of 150 k‑250 k per minute.

5. Additional Optimization Ideas

Message queue to decouple the spin‑wheel UI from the result generation, allowing true request queuing.

Asynchronous processing for the heavy RPC call that consumed ~50 % of request time.

Read‑write splitting (discarded for this case due to consistency requirements).

Activity‑level database sharding to isolate load.

In‑memory databases for ultra‑low latency.

Hardware upgrades (SSD already proved effective; future upgrades could further raise capacity).

6. Key Takeaways

High traffic spikes often contain a large proportion of scripted requests; behavior‑based filtering is essential to protect genuine users.

Performance tuning must consider the entire stack—from front‑end throttling to JVM settings, database configuration, and underlying hardware.

Never rely solely on code‑level optimizations; hardware bottlenecks (e.g., old HDDs) can nullify all other efforts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Backend Architecture load balancing high concurrency semaphore rate limiting database scaling

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.