Technical Challenges and Solutions for High‑Concurrency Flash Sale (秒杀) Systems

The article examines the technical challenges of high‑concurrency flash‑sale events such as Double 11, and presents a backend‑centric architecture employing rate limiting, cache (Redis), message‑queue peak‑shaving, and asynchronous processing to ensure scalability and prevent system avalanches.

Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Technical Challenges and Solutions for High‑Concurrency Flash Sale (秒杀) Systems

Flash sale (秒杀) is a signature event in e‑commerce that creates extreme high‑concurrency scenarios, raising serious technical challenges that must be addressed.

Flash‑sale scenario : During Taobao's Double 11 flash sale, millions of users converge in a short period, generating massive traffic (e.g., 10 million users competing for 100 items). This puts strict performance demands on backend databases and cache services.

Technical challenges behind flash sales

1. Sudden server and network demand : During Double 11, server capacity can be 3‑5 times normal and network bandwidth many times higher.

2. High‑concurrency load : System throughput is measured by QPS (queries per second). A theoretical peak of 100 k QPS is calculated assuming 100 ms response time, 20 web servers, each with 500 max connections, but real‑world load quickly exceeds this, causing response times to rise and increasing database connection pressure.

3. High coupling leading to avalanche : If a single service slows or fails, user retries increase load on remaining servers, potentially causing a cascading failure that brings down the entire system.

How to solve flash‑sale bottlenecks

Architecture design ideas :

Intercept requests upstream to reduce downstream pressure; only a tiny fraction of requests succeed, so early filtering prevents database lock contention and timeouts.

Leverage caching (Redis) to dramatically speed up read/write operations.

Use message middleware (ActiveMQ, Kafka, etc.) to smooth peaks: queue massive concurrent requests and let backend workers pull messages at a manageable rate.

Frontend design :

Page staticization: render static elements and serve via CDN to absorb spikes.

Disable duplicate submissions: gray out the button after a user clicks.

User rate limiting: allow only one request per user within a time window (e.g., IP‑based throttling).

Backend design :

Gateway layer: limit request frequency per UID to block malicious or excessive access.

Service layer: even after upstream filtering, millions of requests still reach this layer; employ a message queue to cache requests instead of sending every request to the database.

Cache for read‑heavy traffic: flash‑sale is a read‑many, write‑few scenario, so caching can offload the database.

Cache for write‑heavy traffic: move inventory data to Redis, perform stock deduction in memory, then synchronize to the database via background workers.

Database layer : Keep the database protected by handling most traffic upstream; it only processes requests within its capacity.

Example: Using Redis and message middleware for a simple flash‑sale system

Redis, a distributed cache, can implement a flash‑sale system with a simple key‑value structure. An atomic counter (e.g., AtomicInteger) stores the stock limit; each user request is pushed with RPUSH key value. When the push count reaches the limit, further inserts stop.

Multiple worker threads pop successful user IDs using LPOP key and then create orders in the database. The same pattern can be realized with ActiveMQ, Kafka, or a combination of cache and message middleware.

Flash‑sale architecture summary

Rate limiting: restrict most traffic, allowing only a small portion to reach the backend.

Peak shaving: transform sudden spikes into steady traffic using cache and message queues.

Asynchronous processing: handle requests asynchronously to boost concurrency.

In‑memory cache: move read/write bottlenecks from disk‑based databases to fast memory caches.

Scalability: design the system to elastically add machines during peak events like Double 11.

-END-

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high concurrencyrate limiting
Mike Chen's Internet Architecture
Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.