How to Build a High‑Performance, Consistent, and Highly Available Flash‑Sale System
This article explores the architectural design of a flash‑sale (秒杀) system, covering high‑performance techniques, consistency guarantees, high‑availability strategies, hotspot optimization, and practical implementation details such as static‑dynamic separation, caching, data integration, and database tuning.
Introduction
Flash‑sale (秒杀) has become familiar since its first appearance in 2011, appearing in events such as Double‑Eleven shopping and 12306 ticket booking. In simple terms, a flash‑sale is a process where a massive number of requests compete to purchase the same product at the same moment.
From an architectural perspective, a flash‑sale system is essentially a "three‑high" system: high performance, high consistency, and high availability. This article discusses the key concerns when building and maintaining a large‑scale flash‑sale system.
Overall Considerations
At a high level, flash‑sale solves two core problems: massive concurrent reads and massive concurrent writes, which translate into requirements for high availability, consistency, and high performance. The design discussion proceeds in three layers:
High Performance : Support for high read/write concurrency, minimizing I/O, and data sharding. The article covers static‑dynamic separation, hotspot optimization, and server‑side performance tuning.
Consistency : Accurate inventory deduction under concurrent requests. Various inventory‑reduction schemes are examined.
High Availability : Handling traffic spikes, unstable dependencies, application bottlenecks, and hardware failures. The article explores architectural measures to keep the system stable under complex conditions.
High Performance
1. Static‑Dynamic Separation
During a flash‑sale, the page does not need to be fully refreshed; only the timer updates. This is achieved by static‑dynamic separation, which consists of three steps: data splitting, static caching, and data integration.
1.1 Data Splitting
The goal is to extract dynamic data so that the remaining page can be cached. Two main dimensions are considered:
User : Identity information, login status, and user profile are fetched via dynamic requests; recommendations can also be loaded asynchronously.
Time : The flash‑sale start time is controlled by the server and obtained via a dynamic request.
1.2 Static Caching
After separating static data, the next step is to cache it appropriately.
1.2.1 Caching Strategy
Static‑generation typically caches the entire HTTP response rather than just static assets. The cache key is usually the product ID, which uniquely identifies the URL.
1.2.2 Cache Location
Static data can be cached in three places: the browser, a CDN, or the server. Browser caching is limited because the client cannot be forced to refresh. Server‑side caching consumes memory per connection and adds latency. Therefore, CDN caching is preferred for its ability to invalidate quickly (seconds) and to serve content close to users, while also avoiding Java‑specific limitations.
Key CDN challenges include:
Cache invalidation within seconds across all nodes.
Maintaining a high hit rate despite geographic distribution.
Practical deployment selects a subset of CDN nodes that are near traffic hotspots, far from the origin, and have good network quality. The architecture diagram is shown below:
1.3 Data Integration
After separating static data, the front‑end must assemble the final page. Two common approaches are ESI (Edge Side Includes) and CSI (Client‑Side Include).
ESI: The proxy server fetches dynamic data and injects it into the static page, delivering a complete page to the user. This puts higher load on the server but offers better user experience.
CSI: The proxy returns only the static page; the browser makes an asynchronous request for dynamic data. This reduces server load at the cost of slightly poorer UX.
1.4 Summary
Static‑dynamic separation improves performance by reducing unnecessary requests and shortening request paths. The concrete methods follow this high‑level direction.
2. Hotspot Optimization
Hotspots are divided into hotspot operations and hotspot data.
2.1 Hotspot Operations
Operations such as zero‑second refresh, zero‑second order, and zero‑second add‑to‑cart are user behaviors that cannot be changed, but they can be protected by rate‑limiting or prompting.
2.2 Hotspot Data
Hotspot data handling follows three steps: identification, isolation, and optimization.
2.2.1 Hotspot Identification
Hotspot data can be static (predictable) or dynamic (unpredictable). Static hotspots are identified before a promotion by analyzing product attributes or seller registration. Dynamic hotspots arise from real‑time events such as live‑stream sales, causing sudden traffic spikes that can bypass cache and hit the database.
Typical identification workflow:
Asynchronously collect hotspot keys from Nginx logs or agent‑based hotspot logs.
Aggregate and analyze the data; once a rule is satisfied, publish the hotspot information to downstream systems for caching or rate‑limiting.
Best practices include asynchronous collection and near‑real‑time detection.
2.2.2 Hotspot Isolation
After identification, isolate hotspot traffic from the remaining 99%:
Business isolation: Separate hot‑sale participants via registration and pre‑warm caches.
System isolation: Deploy a dedicated cluster or domain for flash‑sale traffic.
Data isolation: Use a dedicated cache cluster or database shard for hotspot data.
2.2.3 Hotspot Optimization
Two main techniques are applied:
Cache hotspot data for a longer period when static‑dynamic separation is in place.
Rate‑limit hotspot requests to protect downstream services.
2.2.4 Summary
Hotspot optimization differs from static‑dynamic separation; it follows the 80/20 principle to handle a small fraction of data with targeted strategies, offering insights for other high‑performance distributed systems.
3. System Optimization
Performance can be improved through hardware upgrades, JVM tuning, and especially code‑level optimizations:
Reduce serialization: Minimize RPC calls by co‑locating tightly related services.
Directly output byte streams: Pre‑encode static strings and avoid costly character‑to‑byte conversions; avoid reflective toString implementations.
Trim log stack traces: Limit exception stack depth in high‑traffic environments.
Remove heavyweight frameworks: In extreme cases, replace MVC frameworks with raw Servlets to cut processing overhead.
4. Summary
Establish performance baselines (performance, cost, and link baselines) and continuously monitor them to drive incremental improvements at the code, business, and architecture layers.
Consistency
Inventory is the critical data in a flash‑sale. Accurate deduction is essential to avoid overselling.
1. Inventory Reduction Methods
Typical e‑commerce purchase flow consists of two steps: order placement and payment. Inventory can be reduced at different stages:
Reduce on order: Immediate deduction when the order is placed. Provides the most precise control.
Reduce on payment: Deduction occurs after payment, which can lead to orders that cannot be fulfilled if stock runs out.
Pre‑reserve (pre‑lock) inventory: Reserve stock for a limited time (e.g., 15 minutes) after order placement, releasing it if payment does not occur.
2. Problems with Inventory Reduction
2.1 Order‑time Reduction
Advantages: Best user experience and precise control via DB transactions.
Disadvantages: Vulnerable to malicious orders that reserve stock without paying, causing loss of sales.
2.2 Payment‑time Reduction
Advantages: Guarantees actual sales.
Disadvantages: Poor user experience; many orders may never pay, leading to apparent overselling.
2.3 Pre‑reserve
Advantages: Balances the two previous methods.
Disadvantages: Still susceptible to malicious orders after the reservation window.
3. Practical Implementation
Industry‑standard solutions often use pre‑reserve combined with anti‑fraud measures (e.g., marking frequent non‑paying users, limiting per‑user purchase quantity).
To prevent overselling, technical safeguards include:
Transactional checks that rollback if inventory would become negative.
Using unsigned integer columns to cause SQL errors on negative values.
SQL CASE WHEN logic, such as:
UPDATE item SET inventory = CASE WHEN inventory >= xxx THEN inventory-xxx ELSE inventory END4. Consistency Performance Optimization
Inventory is both a hotspot and a high‑read/high‑write challenge.
4.1 High‑Concurrency Reads
Use layered validation: perform lightweight checks (eligibility, product status, request legality) during the read path, deferring strict consistency checks to the write path. This allows the use of distributed caches or local caches, tolerating some stale reads.
4.2 High‑Concurrency Writes
Two approaches:
Change the database choice: For simple inventory deduction, a persistent cache like Redis can be used.
Optimize the database: Reduce row‑level lock contention in MySQL, employ application‑level distributed locks, or use database‑level queuing patches (e.g., Alibaba's AliSQL) to serialize access to hot rows.
4.3 Summary
Read‑side optimizations have more headroom, while write‑side bottlenecks are bound by storage. Balancing CAP trade‑offs is essential.
5. Summary
Additional challenges include inventory rollback after pre‑reserve timeout and ensuring consistency between payment gateways and inventory updates.
High Availability
Flash‑sale traffic forms a sharp spike at a specific moment, creating a massive instantaneous load.
1. Traffic Shaping
Since the number of successful purchases is fixed, the system can limit the effective request volume. Techniques include:
Answer‑the‑question challenges to delay requests and filter bots.
Queueing mechanisms (message queues, thread‑pool locks, local memory buffering) to smooth bursts.
Filtering at multiple layers: read rate‑limiting, read caching, write rate‑limiting, and write validation.
1.1 Answer‑the‑Question
Adding a quiz before the final purchase step deters automated bots and spreads the request window from sub‑second to several seconds, reducing peak pressure on the backend.
1.2 Queueing
Common approaches include message queues, thread‑pool locking, local memory buffering, and file‑based serialization. Drawbacks are request backlog and degraded user experience due to out‑of‑order processing.
1.3 Filtering
Layered filtering removes invalid requests early, preserving I/O for genuine traffic.
1.4 Summary
Combining answer‑the‑question, queueing, and filtering balances commercial goals with architectural performance.
2. Plan B
When sustained high traffic overwhelms the system, a fallback Plan B is required. High‑availability is a lifecycle effort covering architecture, coding, testing, deployment, operation, and incident response.
Architecture stage: Design for scalability and fault tolerance, e.g., multi‑region deployment.
Coding stage: Implement robust code with proper timeouts and error handling.
Testing stage: Ensure CI coverage and static analysis quality.
Release stage: Use checklists, upstream/downstream notifications, and rollback mechanisms.
Operation stage: Real‑time monitoring, accurate alerting, and detailed diagnostics.
Incident stage: Rapid damage control, root‑cause analysis, and service restoration.
Operational measures include regular pressure testing, degradation/flow‑control/ circuit‑breaker protection, performance baselines, alert systems, and rapid recovery tools.
3. Summary
High availability is essentially stability; it is often deprioritized until a failure occurs. Organizational commitment—such as tying stability metrics to performance evaluations and forming dedicated reliability teams—helps embed reliability into the development lifecycle.
Personal Summary
A flash‑sale system can be built incrementally, from simple to complex architectures, based on traffic volume and business requirements. The key is to make trade‑offs consciously and keep the main design goal in focus.
Source: https://segmentfault.com/a/1190000020970562
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
