Optimizing a High‑Concurrency Lottery System: Caching, Queueing, Optimistic Locking, and Read/Write Splitting
The article analyzes a lottery‑service bottleneck caused by massive concurrent database reads and writes and presents a comprehensive set of backend optimization techniques—including caching, queue‑based peak‑shaving, optimistic locking, asynchronous processing, read‑write splitting, and semaphore‑based rate limiting—to improve throughput and stability under high load.
1. Project Consideration
The lottery activity sent SMS reminders to users about their draw rights; when users accessed the draw page, the page quickly became unavailable. Logs showed the bottleneck was the database due to severe read‑write conflicts, causing many connections to timeout. Monitoring revealed the lottery microservice QPS surged 12× and the DB QPS rose 10×, highlighting a classic high‑concurrency I/O bottleneck.
2. Optimization Ideas
Based on senior engineers' experience and online references, the main measures are downgrade, rate limiting, caching, and message queues, with the principle of minimizing direct DB exposure by handling most requests at the service layer.
3. Optimization Details
1. Lottery Detail Page
a. Enable online caching
Although cache logic existed, the switch was not turned on. Enabling it reduces DB concurrent I/O pressure and lock contention.
b. Local cache eviction strategy
Instead of clearing the entire cache when it reaches the limit, use eviction algorithms such as LRU, LFU, or NRU. Example Guava cache configuration:
// Set initial cache capacity to 10
initialCapacity(10)
// Set maximum size to 100; excess entries are evicted using LRU
maximumSize(100)2. Lottery Logic
a. Queue‑based peak shaving
Introduce a single‑process queue; incoming draw requests are enqueued and processed one by one, eliminating the QPS spike. When the queue length exceeds a threshold (e.g., 1000 for 100 prizes), further requests are immediately returned as “no win”. Tair can record the queue length, and when prizes are exhausted the queue is cleared.
b. Replace pessimistic row locks with optimistic locks
The original code used `FOR UPDATE` pessimistic locking, causing many threads to wait indefinitely and exhaust DB connections. Switching to optimistic locking with a version field allows concurrent updates; only requests with a matching version succeed, others receive a failure response.
c. Asynchronous handling of non‑critical steps
After a successful draw, send SMS via a dedicated thread pool to improve overall request throughput.
d. Database read‑write separation
Redirect read‑heavy queries to a replica, relieving the primary DB of read load while writes continue on the master.
e. Semaphore control per time slice
Limit the number of concurrent users entering the draw window, preventing overload and pairing with the queue for additional rate limiting.
f. Message‑based persistence
Enqueue data changes (e.g., in Tair) and let a scheduled task batch‑write them to the DB, drastically reducing concurrent DB writes while requiring careful consistency handling.
g. Conditional rate limiting and degradation
When concurrency exceeds safe limits, treat excess requests as “no win” to preserve overall system availability.
3. Additional Considerations
a. Prevent malicious abuse
Apply per‑UID request caps at the service entry point to mitigate CC attacks and excessive QPS.
b. Pre‑select winning candidates
Randomly choose a pool of potential winners (e.g., 500 out of 100 000 users for 100 prizes) and only route those candidates through the full draw logic, filtering out the majority early to reduce DB load.
4. Architecture Diagram
Source: CSDN Blog
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
