Designing a High-Concurrency Flash Sale (秒杀) System: Key Techniques and Best Practices
This article explains how to design a flash‑sale (秒杀) system that can handle sudden spikes of traffic by using static pages, CDN acceleration, caching strategies, distributed locks, message‑queue asynchronous processing, rate limiting and other backend techniques to ensure reliability and prevent overselling.
Preface
How to design a flash‑sale system under high concurrency is a frequent interview question; although it looks simple, it involves deep knowledge from front‑end to back‑end.
Flash‑sale (秒杀) is a promotional activity where a limited number of items (e.g., 10 phones) are sold at a very low price (e.g., 0.1 CNY). Only a few users succeed, and the activity is mainly for marketing.
Despite being a promotion, the technical requirements are high. Below are nine essential details for designing such a system.
1. Instantaneous High Concurrency
Traffic spikes sharply a few minutes before the sale time and peaks at the exact moment, then drops quickly because most users receive a "sold out" message and leave.
This short‑lived peak makes traditional systems struggle, so we need to address it via:
Page staticization
CDN acceleration
Caching
MQ asynchronous processing
Rate limiting
Distributed locks
2. Page Staticization
The activity page is the first entry point and receives the highest traffic. Direct server requests would overwhelm the backend.
Since most page content (product name, description, images) is static, we should serve a static version and only allow server access when the user clicks the flash‑sale button at the exact time.
Because users are geographically distributed, we need a CDN (Content Delivery Network) to deliver the static page from the nearest node, reducing latency and network congestion.
3. Flash‑Sale Button
Before the sale starts, the button is greyed out and non‑clickable. At the sale moment it becomes active, but users may repeatedly refresh the page to catch the moment.
We control the button state via a JavaScript file that is cached on the CDN. The JS file contains a flag and a random parameter; when the sale starts a new JS file with a new random parameter is generated and pushed to the CDN, ensuring the latest version is fetched.
A client‑side timer can also limit each user to one request within a short interval (e.g., 10 seconds).
4. Read‑Heavy Write‑Light Pattern
During a flash sale, the system first checks inventory; if insufficient it returns "sold out". Because most requests will find insufficient stock, the pattern is classic "read‑many, write‑few".
Relying solely on a relational database (e.g., MySQL) can exhaust connections, so we should use a cache such as Redis, deployed with multiple nodes.
5. Cache Issues
Product information (ID, name, specs, stock) is stored in Redis and also persisted in the database.
When a request arrives, we first look up the product in the cache; if missing, we fetch from the database, populate the cache, and proceed. If the product does not exist, we fail fast.
5.1 Cache Penetration (Cache Miss Storm)
When many concurrent requests miss the cache for a product that exists only in the database, the database can be overwhelmed. The solution is to use a distributed lock to serialize the cache‑miss loading.
Pre‑warming the cache at application startup (loading all products into Redis) can also avoid this problem.
5.2 Cache Penetration (Non‑existent Keys)
Requests for non‑existent product IDs would repeatedly hit the database. Using a Bloom filter to quickly reject invalid IDs helps, but the filter must stay consistent with the cache.
If the filter cannot stay up‑to‑date, we can cache a special marker for "non‑existent" keys with a short TTL.
5.3 Cache Breakdown (Hot Key)
When a hot product’s cache expires, a massive burst of requests can again hit the database. Adding a lock or using a short TTL with a fallback strategy mitigates this.
6. Inventory Management
Simple stock deduction is not enough because orders may not be paid immediately; we need a pre‑deduction (pre‑lock) and a rollback mechanism.
6.1 Database Stock Deduction
update product set stock = stock - 1 where id = 123;To avoid overselling, we must check stock before updating, but the check‑then‑update is not atomic.
6.2 Redis Stock Deduction
boolean exist = redisClient.query(productId, userId);
if (exist) { return -1; }
int stock = redisClient.queryStock(productId);
if (stock <= 0) { return 0; }
redisClient.incrby(productId, -1);
redisClient.add(productId, userId);
return 1;Adding synchronized makes it safe but hurts performance.
6.3 Optimized Redis Deduction
if (redisClient.incrby(productId, -1) < 0) { return 0; }
redisClient.add(productId, userId);
return 1;This still may produce negative stock under extreme concurrency.
6.4 Lua Script for Atomic Deduction
StringBuilder lua = new StringBuilder();
lua.append("if (redis.call('exists', KEYS[1]) == 1) then");
lua.append(" local stock = tonumber(redis.call('get', KEYS[1]));");
lua.append(" if (stock == -1) then return 1; end;");
lua.append(" if (stock > 0) then redis.call('incrby', KEYS[1], -1); return stock; end;");
lua.append(" return 0; end; return -1;");The script checks existence, handles unlimited stock (-1), decrements atomically, and returns appropriate status.
7. Distributed Locks
When many requests miss the cache, they all hit the database, causing a crash. Redis distributed locks solve this.
7.1 setNx Lock
if (jedis.setnx(lockKey, val) == 1) { jedis.expire(lockKey, timeout); }Because setting the expiration is not atomic, a failure can leave a permanent lock.
7.2 set Command Lock
String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
if ("OK".equals(result)) { return true; }
return false;This is atomic.
7.3 Unlock
if (jedis.get(lockKey).equals(requestId)) { jedis.del(lockKey); return true; }
return false;Using a Lua script makes the check‑and‑delete atomic.
7.4 Spin Lock
try {
long start = System.currentTimeMillis();
while (true) {
String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
if ("OK".equals(result)) { return true; }
if (System.currentTimeMillis() - start >= timeout) { return false; }
Thread.sleep(50);
}
} finally { unlock(lockKey, requestId); }
return false;The loop retries until timeout, reducing lock‑competition failures.
7.5 Redisson
Redisson provides a higher‑level API that handles lock competition, auto‑renewal, re‑entrancy, and multi‑node coordination.
8. MQ Asynchronous Processing
The three core flash‑sale flows are: request → stock check → order creation → payment. Only the stock‑check part needs ultra‑high concurrency; order creation and payment can be processed asynchronously via a message queue.
8.1 Message Loss
To avoid losing order messages, write the message to a "send table" with status "pending" before sending to MQ. After successful consumption, update the status to "processed".
8.2 Duplicate Consumption
Maintain a "processing table"; before handling a message, check if it already exists. If so, skip; otherwise, process and insert the record in the same transaction.
8.3 Garbage Messages
Limit the retry count in the send table; when the maximum is reached, stop retrying to prevent endless garbage messages.
8.4 Delayed Consumption
Use a delayed queue (e.g., RocketMQ) to automatically cancel unpaid orders after a timeout (e.g., 15 minutes). The consumer checks the order status; if still pending, it marks the order as cancelled.
9. Rate Limiting
To prevent bots from hammering the flash‑sale API, we can limit requests by user, IP, or interface, and optionally add captchas or raise business thresholds (e.g., member‑only sales).
9.1 User‑Based Limiting
Allow only a certain number of requests per user per minute.
9.2 IP‑Based Limiting
Limit the number of requests per IP per minute, though this may affect users behind the same NAT.
9.3 Interface‑Based Limiting
Set a global request cap for the flash‑sale endpoint; excessive traffic may affect normal users.
9.4 Captcha
Require a captcha (including sliding‑puzzle captchas) before allowing the request, ensuring each attempt is human‑verified.
9.5 Business Thresholds
Raise participation requirements (e.g., members only, higher user level) to reduce malicious traffic without heavy technical controls.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
