Designing a Robust Flash‑Sale (秒杀) System: Architecture, High Concurrency Handling, and Rate‑Limiting Strategies
This article examines the challenges of building a flash‑sale system—such as overselling, massive concurrent requests, URL exposure, and database pressure—and presents a comprehensive backend design that includes dedicated databases, dynamic URLs, static pages, Redis clustering, Nginx load balancing, SQL optimization, token‑bucket rate limiting, asynchronous order processing, and service degradation techniques.
Flash‑sale (秒杀) systems, like those used by JD, Taobao, or Xiaomi, attract huge traffic in a very short time, creating problems such as overselling, high concurrency, request flooding, URL leakage, and database overload. This article explores these issues and proposes a robust backend design.
Key Problems to Consider
Overselling: Limited stock (e.g., 100 items) can be sold out multiple times if not controlled.
High Concurrency: Millions of users may request within minutes, risking cache stampede and database collapse.
Interface Abuse: Automated scripts can send hundreds of requests per second; safeguards are needed.
URL Exposure: Users can discover the sale URL via browser dev tools and bypass front‑end controls.
Database Impact: A flash‑sale sharing the same DB with other services can cause cascading failures.
Massive Request Volume: Even a powerful Redis instance (~40k QPS) may be insufficient for spikes of hundreds of thousands of QPS.
Design and Technical Solutions
Separate Flash‑Sale Database
Use an isolated database with at least two tables: miaosha_order for orders and miaosha_goods for goods. Additional tables for product details and user information are recommended.
Dynamic Flash‑Sale URL
Generate the sale URL dynamically using an MD5 hash of a random string; the front‑end requests the URL from the back‑end, which validates it before allowing the purchase.
Static Page Rendering
Render product description, parameters, transaction records, images, and reviews into a static HTML page (e.g., via FreeMarker) so that user requests bypass the application server and database, reducing load.
Redis Cluster
Deploy Redis in a clustered (sentinel) mode to handle cache‑stampede scenarios and improve availability.
Nginx Front‑End
Place Nginx in front of Tomcat clusters; Nginx can handle tens of thousands of concurrent connections, forwarding them to the back‑end pool.
SQL Optimization
Combine stock check and decrement into a single UPDATE statement with optimistic locking (version field) to avoid double‑query overselling.
Redis Pre‑Decrement
Initialize stock in Redis (e.g., redis.set(goodsId, 100)) and atomically decrement using Lua scripts to ensure consistency, while handling cancellations by incrementing back.
Rate Limiting
Implement multiple layers:
Front‑end throttling: Disable the purchase button for a few seconds after a click.
Per‑user repeat‑request limit: Use Redis key expiration (e.g., 10 s) to reject rapid duplicate requests.
Token‑bucket algorithm: Use Guava's RateLimiter to generate tokens at a controlled rate.
Example of a simple token‑bucket limiter:
public class TestRateLimiter {
public static void main(String[] args) {
// 1 token per second
final RateLimiter rateLimiter = RateLimiter.create(1);
for (int i = 0; i < 10; i++) {
double waitTime = rateLimiter.acquire();
System.out.println("Task " + i + " wait time " + waitTime);
}
System.out.println("Execution finished");
}
}Running this shows the first task proceeds immediately, while subsequent tasks wait for token generation.
A stricter version using tryAcquire with a timeout rejects tasks that cannot obtain a token quickly:
public class TestRateLimiter2 {
public static void main(String[] args) {
final RateLimiter rateLimiter = RateLimiter.create(1);
for (int i = 0; i < 10; i++) {
long timeout = (long)0.5; // seconds
boolean isValid = rateLimiter.tryAcquire(timeout, TimeUnit.SECONDS);
System.out.println("Task " + i + " valid: " + isValid);
if (!isValid) continue;
System.out.println("Task " + i + " executing");
}
System.out.println("Finished");
}
}Only the first request obtains a token; the rest are dropped, illustrating the effectiveness of token‑bucket throttling under extreme load.
Asynchronous Order Processing
After rate limiting and stock verification, push valid orders to a message queue (e.g., RabbitMQ) for asynchronous processing, which decouples order creation from the request thread and provides peak‑shaving.
Service Degradation
Use circuit‑breaker tools such as Hystrix to provide fallback responses when a service instance fails, ensuring a graceful user experience instead of hard crashes.
Conclusion
The presented architecture—isolated flash‑sale DB, dynamic URLs, static page rendering, Redis cluster, Nginx load balancing, optimized SQL, token‑bucket rate limiting, asynchronous queuing, and graceful degradation—can comfortably handle hundreds of thousands of concurrent requests. For larger scales (tens of millions), further measures like database sharding, Kafka queues, and larger Redis clusters would be required.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
