How to Build a High‑Performance Flash Sale System: Architecture, Strategies & Code
This article examines the challenges of designing a robust flash‑sale backend—preventing oversell, handling massive concurrent requests, securing URLs, isolating databases, and applying techniques such as Redis clustering, Nginx load balancing, rate limiting, asynchronous processing, and service degradation—to enable stable, low‑latency sales spikes.
Key Issues for Flash‑Sale Systems
Overselling : Limited inventory (e.g., 100 items) must never be sold beyond its quantity, otherwise financial loss occurs.
High Concurrency : A flash‑sale lasts only minutes and can generate millions of requests, risking cache breakdown and DB overload.
Interface Abuse : Automated scripts may issue hundreds of requests per second; the system must filter repeated or invalid calls.
URL Exposure : Users can discover the sale endpoint via browser dev tools; the URL must be unpredictable.
Database Coupling : Running flash‑sale traffic on the same DB as other services can cascade failures; isolation is required.
Massive Request Volume : A single Redis node handles ~40k QPS, far below the potential hundreds of thousands of requests; scaling strategies are needed.
Architecture and Technical Solutions
Separate Flash‑Sale Database
Two core tables are created in a dedicated database: miaosha_goods: stores product ID, stock, price, flash price, version (for optimistic locking) and other product metadata. miaosha_order: records successful orders, linking user_id and goods_id.
Additional tables (product details, user profile) can be linked via goods_id and user_id respectively.
Dynamic Sale URL
The sale endpoint is generated at runtime by MD5‑hashing a random string. The front‑end requests the URL from the back‑end just before the sale starts, preventing pre‑knowledge of the endpoint.
Static Page Rendering
Product details (description, parameters, images, reviews) are rendered into a static HTML page using a template engine such as FreeMarker. Clients receive the static page directly, bypassing the application server and database, which reduces load dramatically.
Redis Cluster & Pre‑Decrement
Redis is deployed in Sentinel or native cluster mode to provide high availability and horizontal scalability. Stock is pre‑loaded into Redis (e.g., redis.set(goodsId, 100)) and decremented atomically on each request. A Lua script ensures atomicity for both decrement and rollback when an order is cancelled.
Nginx Reverse Proxy
Nginx sits in front of a Tomcat (or Spring Boot) cluster, distributing incoming traffic across multiple application instances. This raises the concurrent handling capacity from a few hundred (Tomcat alone) to tens of thousands of requests per second.
SQL Optimisation with Optimistic Locking
Stock check and decrement are combined into a single UPDATE statement that uses a version column as an optimistic lock. This eliminates the read‑modify‑write race and prevents oversell.
update miaosha_goods
set stock = stock - 1,
version = version + 1
where goods_id = #{goods_id}
and version = #{version}
and stock > 0;Rate Limiting
Front‑End Throttling : Disable the “Buy” button for a few seconds after a click to avoid duplicate submissions.
Per‑User Request Window : Store a Redis key per user with a short TTL (e.g., 10 s). Subsequent requests within the window are rejected.
if (redis.get(userId) == null) {
redis.setex(userId, 10, "1"); // allow request
} else {
// reject duplicate request
}Token‑Bucket Algorithm (Guava RateLimiter) : Generate tokens at a fixed rate; only requests that acquire a token are processed.
public class TestRateLimiter {
public static void main(String[] args) {
RateLimiter limiter = RateLimiter.create(1); // 1 token per second
for (int i = 0; i < 10; i++) {
double wait = limiter.acquire(); // blocks until token available
System.out.println("Task " + i + " wait=" + wait);
}
}
}Non‑blocking variant using tryAcquire with a timeout (e.g., 0.5 s) discards requests that cannot obtain a token quickly.
public class TestRateLimiter2 {
public static void main(String[] args) {
RateLimiter limiter = RateLimiter.create(1);
for (int i = 0; i < 10; i++) {
if (limiter.tryAcquire(0.5, TimeUnit.SECONDS)) {
System.out.println("Task " + i + " executed");
} else {
System.out.println("Task " + i + " dropped");
}
}
}
}Asynchronous Order Processing
After passing rate limiting and stock verification, the order request is placed onto a message queue (e.g., RabbitMQ). Consumers process the order asynchronously, which decouples the front‑end from the DB write path, smooths traffic spikes, and enables retry/compensation logic for failures.
Service Degradation
When a node crashes or a downstream service becomes unavailable, a fallback (e.g., Hystrix circuit‑breaker) returns a friendly message instead of a hard error, preserving the overall user experience.
Scalability Considerations
For traffic beyond a few hundred thousand QPS, consider database sharding, Kafka instead of RabbitMQ, and larger Redis clusters.
Horizontal scaling of the application tier (multiple Tomcat/Spring Boot instances) behind Nginx further raises capacity.
Fine‑grained token‑bucket rates can be tuned per product to balance fairness and throughput.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Senior Brother's Insights
A public account focused on workplace, career growth, team management, and self-improvement. The author is the writer of books including 'SpringBoot Technology Insider' and 'Drools 8 Rule Engine: Core Technology and Practice'.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
