How to Build a High‑Performance Flash‑Sale System: Architecture, Caching, and Scaling Strategies
This article explains how to design a high‑concurrency flash‑sale (秒杀) system by optimizing concurrent reads and writes, applying API and architecture principles, separating dynamic and static data, handling hot items, shaping traffic, improving performance, managing inventory reduction, and implementing robust fallback mechanisms for high availability.
1. Overview
1.1 Concurrent Read/Write
The main challenge of flash‑sale systems is concurrent reads and writes. Optimizing reads means reducing the amount of data a user must fetch from the server; optimizing writes means isolating a dedicated database for special handling. Protection mechanisms and fallback plans are also required.
1.2 API Design Principles
For ultra‑high‑traffic, high‑performance, highly‑available systems, the user request path should follow four rules: as little data as possible, as few requests as possible, as short a path as possible, and as few dependencies as possible, with no single points of failure.
1.3 Flash‑Sale Architecture Principles
High Availability : The architecture must remain stable both under expected load and traffic spikes.
Consistency
Data must be consistent; the total number of successful transactions must match the configured quantity.
Performance
The system must sustain massive traffic by optimizing every link in the request chain.
2. Architecture Principles
2.1 Minimize Data
Both request payloads and responses should be as small as possible to reduce network latency and CPU usage for compression and encoding. System‑level data dependencies should also be minimized to avoid excessive database interaction.
2.2 Minimize Request Count
Extra requests such as CSS, JavaScript, images, and Ajax calls should be reduced. For example, merge multiple JS files into a single request using a URL that the server resolves into combined content.
2.3 Shorten Path
Each intermediate node (proxy, additional socket, etc.) adds latency and reduces overall availability. Shorter paths improve both reliability and performance.
2.4 Reduce Dependencies
Classify system components into levels (0‑level, 1‑level, …). Critical services (e.g., payment) should have minimal strong dependencies on lower‑level services to avoid cascading failures.
2.5 Eliminate Single Points
Stateless services and dynamic configuration (via a config center) remove machine‑specific bindings, while data that must be persisted is replicated to avoid single‑point failures.
3. Architecture Cases for Different Scenarios
A simple implementation adds a “timed‑release” button to the product page. As traffic grows from 10k/s to 100k/s, the architecture evolves:
Separate the flash‑sale system into its own service.
Deploy an independent machine cluster for flash‑sale traffic.
Cache hot data (e.g., inventory) in a dedicated cache.
Add a quiz to deter automated bots.
Further scaling includes full static‑dynamic separation, local caching of product details, and adding rate‑limiting protection.
4. Dynamic/Static Separation Solution
4.1 What Is Dynamic vs. Static Data
Static data does not depend on URL, user, time, region, or cookies; dynamic data does. Static data can be aggressively cached.
4.2 Caching Static Data
4.2.1 Nearest to the User
Cache in the browser, CDN, or server‑side cache.
4.2.2 Cache the HTTP Connection Directly
Web proxies can return the stored HTTP response (headers + body) without re‑parsing the protocol.
4.2.3 Language‑Specific Cache Choices
Because Java is not efficient at handling massive connections, static caching is often performed at the web‑server layer (Nginx, Apache, Varnish) rather than inside Java.
4.3 Static Data Handling
URL uniquification – use the URL as the cache key.
Separate user‑related factors (login status, identity).
Separate time‑related factors.
Asynchronously fetch region‑specific data.
Strip cookies from cached responses (e.g., Varnish unset req.http.cookie).
4.4 Dynamic Data Handling
4.4.1 ESI (Edge Side Includes)
Insert dynamic fragments into a cached static page at the edge proxy.
4.4.2 CSI (Client‑Side Include)
Fetch dynamic fragments via asynchronous JavaScript requests.
4.5 Full Dynamic/Static Separation Architecture
4.5.1 Single‑Machine Deployment
Deploy Nginx + Cache + Java on a physical server, using consistent‑hash groups to balance cache hit rate and avoid hot‑spot overload.
4.5.2 Unified Cache Layer
Separate cache into its own cluster, reducing operational cost and enabling shared memory across services.
4.5.3 CDN Front‑End
Push the cache further to a CDN; use a small number of second‑level CDN caches to keep hit rates high while serving users close to the edge.
5. Hot Data Handling
5.1 What Is Hot Data
Hot data are items that receive massive read/write traffic. It can be static (predictable) or dynamic (unpredictable).
5.2 Discovering Hot Data
5.2.1 Static Hot Data
Identify hot items via business rules (e.g., sellers register for promotions) or by calculating top‑N products from traffic logs.
5.2.2 Dynamic Hot Data
Build an asynchronous pipeline that collects hotspot keys from middleware (Nginx, cache, RPC) and publishes them to downstream services for protection.
5.3 Processing Hot Data
Optimization : Cache hot items; static hot data can be cached long‑term.
Limiting : Use consistent‑hash bucket queues to throttle hot‑item requests.
Isolation : Separate hot‑item processing at business, system, and data layers.
6. Traffic Shaping (Peak Cutting)
6.1 Why Cut Peaks
To keep server resources from being overwhelmed during the flash‑sale burst.
6.2 Lossless Peak‑Cutting Methods
6.2.1 Queuing
Buffer spikes with a message queue, converting synchronous calls into asynchronous pushes.
6.2.2 Quiz
Introduce a short quiz to deter bots and to artificially delay requests, spreading the load over a longer time window.
6.2.3 Layered Filtering
Apply a funnel‑style filter across CDN, front‑end, back‑end, and database layers to drop invalid requests early.
7. Factors Influencing Performance
7.1 Definition of Performance
Measured by QPS and response time (RT). Shorter RT yields higher QPS; in multi‑threaded environments, QPS = (1000 ms / RT) × thread count.
7.2 Finding Bottlenecks
CPU is the primary bottleneck for flash‑sale systems. Use profilers (JProfiler, YourKit) or periodic jstack sampling to locate hot functions. If CPU usage stays below ~95 % at peak QPS, other resources may be limiting.
7.3 System Optimizations (Java‑Specific)
7.3.1 Reduce Encoding
Avoid unnecessary character‑to‑byte conversions; stream static data directly via resp.getOutputStream().
7.3.2 Reduce Serialization
Minimize RPC calls; merge tightly related services into a single deployment to avoid serialization overhead.
7.3.3 Java‑Specific Flash‑Sale Optimizations
Use plain Servlets instead of heavyweight MVC frameworks.
Write output directly with resp.getOutputStream() and prefer JSON over template rendering.
7.3.4 Concurrent Read Optimization
Cache product titles and descriptions locally on each flash‑sale machine; cache inventory with short‑lived passive expiration.
7.3.5 Reducing Serialization in RPC
Deploy related services on the same JVM to bypass network serialization.
8. Inventory Reduction Logic
8.1 Reduction Methods
Order‑time reduction : Decrease inventory when an order is placed (precise but vulnerable to fake orders).
Payment‑time reduction : Decrease inventory after payment (prevents fake orders but may cause oversell).
Pre‑deduction : Reserve inventory for a limited time after ordering; release if payment does not occur.
8.2 Problems and Mitigations
Combine strategies, add anti‑cheat measures (user tagging, purchase limits, rate limiting) to handle malicious orders.
8.3 Large‑Scale Flash‑Sale Inventory
Use SQL like the following to ensure inventory never goes negative:
UPDATE item SET inventory = CASE WHEN inventory >= :qty THEN inventory-:qty ELSE inventory ENDFor simple cases, inventory can be decremented directly in a cache (e.g., Redis). For complex cases, keep the operation in the database to leverage transactions and row‑level locking.
9. Designing Fallback Plans
9.1 High‑Availability Foundations
Architecture: Multi‑datacenter, eliminate single points.
Code: Robust error handling, timeouts, default fallbacks.
Testing: Comprehensive test cases covering worst‑case scenarios.
Release: Quick rollback mechanisms.
Operation: Accurate monitoring and alerting.
Failure: Immediate damage control (e.g., take down erroneous product).
9.2 Degradation
When traffic exceeds a threshold, disable non‑core features (e.g., reduce displayed transaction records) via feature‑switches.
9.3 Rate Limiting
Apply client‑side or server‑side limits based on QPS or thread count; use token‑bucket or leaky‑bucket algorithms to protect the system.
9.4 Reject‑Service
When CPU or load crosses critical values, return HTTP 503 immediately to prevent total collapse; the system can recover automatically when load drops.
END
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
