Designing a High-Concurrency Flash Sale (秒杀) System: Key Techniques and Best Practices

This article explains how to design a flash‑sale (秒杀) system that can handle sudden spikes of traffic by using static pages, CDN acceleration, caching strategies, distributed locks, message‑queue asynchronous processing, rate limiting and other backend techniques to ensure reliability and prevent overselling.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
Designing a High-Concurrency Flash Sale (秒杀) System: Key Techniques and Best Practices

Preface

How to design a flash‑sale system under high concurrency is a frequent interview question; although it looks simple, it involves deep knowledge from front‑end to back‑end.

Flash‑sale (秒杀) is a promotional activity where a limited number of items (e.g., 10 phones) are sold at a very low price (e.g., 0.1 CNY). Only a few users succeed, and the activity is mainly for marketing.

Despite being a promotion, the technical requirements are high. Below are nine essential details for designing such a system.

1. Instantaneous High Concurrency

Traffic spikes sharply a few minutes before the sale time and peaks at the exact moment, then drops quickly because most users receive a "sold out" message and leave.

This short‑lived peak makes traditional systems struggle, so we need to address it via:

Page staticization

CDN acceleration

Caching

MQ asynchronous processing

Rate limiting

Distributed locks

2. Page Staticization

The activity page is the first entry point and receives the highest traffic. Direct server requests would overwhelm the backend.

Since most page content (product name, description, images) is static, we should serve a static version and only allow server access when the user clicks the flash‑sale button at the exact time.

Because users are geographically distributed, we need a CDN (Content Delivery Network) to deliver the static page from the nearest node, reducing latency and network congestion.

3. Flash‑Sale Button

Before the sale starts, the button is greyed out and non‑clickable. At the sale moment it becomes active, but users may repeatedly refresh the page to catch the moment.

We control the button state via a JavaScript file that is cached on the CDN. The JS file contains a flag and a random parameter; when the sale starts a new JS file with a new random parameter is generated and pushed to the CDN, ensuring the latest version is fetched.

A client‑side timer can also limit each user to one request within a short interval (e.g., 10 seconds).

4. Read‑Heavy Write‑Light Pattern

During a flash sale, the system first checks inventory; if insufficient it returns "sold out". Because most requests will find insufficient stock, the pattern is classic "read‑many, write‑few".

Relying solely on a relational database (e.g., MySQL) can exhaust connections, so we should use a cache such as Redis, deployed with multiple nodes.

5. Cache Issues

Product information (ID, name, specs, stock) is stored in Redis and also persisted in the database.

When a request arrives, we first look up the product in the cache; if missing, we fetch from the database, populate the cache, and proceed. If the product does not exist, we fail fast.

5.1 Cache Penetration (Cache Miss Storm)

When many concurrent requests miss the cache for a product that exists only in the database, the database can be overwhelmed. The solution is to use a distributed lock to serialize the cache‑miss loading.

Pre‑warming the cache at application startup (loading all products into Redis) can also avoid this problem.

5.2 Cache Penetration (Non‑existent Keys)

Requests for non‑existent product IDs would repeatedly hit the database. Using a Bloom filter to quickly reject invalid IDs helps, but the filter must stay consistent with the cache.

If the filter cannot stay up‑to‑date, we can cache a special marker for "non‑existent" keys with a short TTL.

5.3 Cache Breakdown (Hot Key)

When a hot product’s cache expires, a massive burst of requests can again hit the database. Adding a lock or using a short TTL with a fallback strategy mitigates this.

6. Inventory Management

Simple stock deduction is not enough because orders may not be paid immediately; we need a pre‑deduction (pre‑lock) and a rollback mechanism.

6.1 Database Stock Deduction

update product set stock = stock - 1 where id = 123;

To avoid overselling, we must check stock before updating, but the check‑then‑update is not atomic.

6.2 Redis Stock Deduction

boolean exist = redisClient.query(productId, userId);
if (exist) { return -1; }
int stock = redisClient.queryStock(productId);
if (stock <= 0) { return 0; }
redisClient.incrby(productId, -1);
redisClient.add(productId, userId);
return 1;

Adding synchronized makes it safe but hurts performance.

6.3 Optimized Redis Deduction

if (redisClient.incrby(productId, -1) < 0) { return 0; }
redisClient.add(productId, userId);
return 1;

This still may produce negative stock under extreme concurrency.

6.4 Lua Script for Atomic Deduction

StringBuilder lua = new StringBuilder();
lua.append("if (redis.call('exists', KEYS[1]) == 1) then");
lua.append("    local stock = tonumber(redis.call('get', KEYS[1]));");
lua.append("    if (stock == -1) then return 1; end;");
lua.append("    if (stock > 0) then redis.call('incrby', KEYS[1], -1); return stock; end;");
lua.append("    return 0; end; return -1;");

The script checks existence, handles unlimited stock (-1), decrements atomically, and returns appropriate status.

7. Distributed Locks

When many requests miss the cache, they all hit the database, causing a crash. Redis distributed locks solve this.

7.1 setNx Lock

if (jedis.setnx(lockKey, val) == 1) { jedis.expire(lockKey, timeout); }

Because setting the expiration is not atomic, a failure can leave a permanent lock.

7.2 set Command Lock

String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
if ("OK".equals(result)) { return true; }
return false;

This is atomic.

7.3 Unlock

if (jedis.get(lockKey).equals(requestId)) { jedis.del(lockKey); return true; }
return false;

Using a Lua script makes the check‑and‑delete atomic.

7.4 Spin Lock

try {
  long start = System.currentTimeMillis();
  while (true) {
    String result = jedis.set(lockKey, requestId, "NX", "PX", expireTime);
    if ("OK".equals(result)) { return true; }
    if (System.currentTimeMillis() - start >= timeout) { return false; }
    Thread.sleep(50);
  }
} finally { unlock(lockKey, requestId); }
return false;

The loop retries until timeout, reducing lock‑competition failures.

7.5 Redisson

Redisson provides a higher‑level API that handles lock competition, auto‑renewal, re‑entrancy, and multi‑node coordination.

8. MQ Asynchronous Processing

The three core flash‑sale flows are: request → stock check → order creation → payment. Only the stock‑check part needs ultra‑high concurrency; order creation and payment can be processed asynchronously via a message queue.

8.1 Message Loss

To avoid losing order messages, write the message to a "send table" with status "pending" before sending to MQ. After successful consumption, update the status to "processed".

8.2 Duplicate Consumption

Maintain a "processing table"; before handling a message, check if it already exists. If so, skip; otherwise, process and insert the record in the same transaction.

8.3 Garbage Messages

Limit the retry count in the send table; when the maximum is reached, stop retrying to prevent endless garbage messages.

8.4 Delayed Consumption

Use a delayed queue (e.g., RocketMQ) to automatically cancel unpaid orders after a timeout (e.g., 15 minutes). The consumer checks the order status; if still pending, it marks the order as cancelled.

9. Rate Limiting

To prevent bots from hammering the flash‑sale API, we can limit requests by user, IP, or interface, and optionally add captchas or raise business thresholds (e.g., member‑only sales).

9.1 User‑Based Limiting

Allow only a certain number of requests per user per minute.

9.2 IP‑Based Limiting

Limit the number of requests per IP per minute, though this may affect users behind the same NAT.

9.3 Interface‑Based Limiting

Set a global request cap for the flash‑sale endpoint; excessive traffic may affect normal users.

9.4 Captcha

Require a captcha (including sliding‑puzzle captchas) before allowing the request, ensuring each attempt is human‑verified.

9.5 Business Thresholds

Raise participation requirements (e.g., members only, higher user level) to reduce malicious traffic without heavy technical controls.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

cachinghigh concurrencyrate limitingflash sale
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.