Backend Development 33 min read

How to Build a High‑Performance, Consistent, and Highly Available Flash‑Sale System

This article explores the architectural design of a flash‑sale (秒杀) system, covering high‑performance techniques, consistency guarantees, high‑availability strategies, hotspot optimization, and practical implementation details such as static‑dynamic separation, caching, data integration, and database tuning.

Efficient Ops

Nov 26, 2019

How to Build a High‑Performance, Consistent, and Highly Available Flash‑Sale System

Introduction

Flash‑sale (秒杀) has become familiar since its first appearance in 2011, appearing in events such as Double‑Eleven shopping and 12306 ticket booking. In simple terms, a flash‑sale is a process where a massive number of requests compete to purchase the same product at the same moment.

From an architectural perspective, a flash‑sale system is essentially a "three‑high" system: high performance, high consistency, and high availability. This article discusses the key concerns when building and maintaining a large‑scale flash‑sale system.

Overall Considerations

At a high level, flash‑sale solves two core problems: massive concurrent reads and massive concurrent writes, which translate into requirements for high availability, consistency, and high performance. The design discussion proceeds in three layers:

High Performance : Support for high read/write concurrency, minimizing I/O, and data sharding. The article covers static‑dynamic separation, hotspot optimization, and server‑side performance tuning.

Consistency : Accurate inventory deduction under concurrent requests. Various inventory‑reduction schemes are examined.

High Availability : Handling traffic spikes, unstable dependencies, application bottlenecks, and hardware failures. The article explores architectural measures to keep the system stable under complex conditions.

High Performance

1. Static‑Dynamic Separation

During a flash‑sale, the page does not need to be fully refreshed; only the timer updates. This is achieved by static‑dynamic separation, which consists of three steps: data splitting, static caching, and data integration.

1.1 Data Splitting

The goal is to extract dynamic data so that the remaining page can be cached. Two main dimensions are considered:

User : Identity information, login status, and user profile are fetched via dynamic requests; recommendations can also be loaded asynchronously.

Time : The flash‑sale start time is controlled by the server and obtained via a dynamic request.

1.2 Static Caching

After separating static data, the next step is to cache it appropriately.

1.2.1 Caching Strategy

Static‑generation typically caches the entire HTTP response rather than just static assets. The cache key is usually the product ID, which uniquely identifies the URL.

1.2.2 Cache Location

Static data can be cached in three places: the browser, a CDN, or the server. Browser caching is limited because the client cannot be forced to refresh. Server‑side caching consumes memory per connection and adds latency. Therefore, CDN caching is preferred for its ability to invalidate quickly (seconds) and to serve content close to users, while also avoiding Java‑specific limitations.

Key CDN challenges include:

Cache invalidation within seconds across all nodes.

Maintaining a high hit rate despite geographic distribution.

Practical deployment selects a subset of CDN nodes that are near traffic hotspots, far from the origin, and have good network quality. The architecture diagram is shown below:

1.3 Data Integration

After separating static data, the front‑end must assemble the final page. Two common approaches are ESI (Edge Side Includes) and CSI (Client‑Side Include).

ESI: The proxy server fetches dynamic data and injects it into the static page, delivering a complete page to the user. This puts higher load on the server but offers better user experience.

CSI: The proxy returns only the static page; the browser makes an asynchronous request for dynamic data. This reduces server load at the cost of slightly poorer UX.

1.4 Summary

Static‑dynamic separation improves performance by reducing unnecessary requests and shortening request paths. The concrete methods follow this high‑level direction.

2. Hotspot Optimization

Hotspots are divided into hotspot operations and hotspot data.

2.1 Hotspot Operations

Operations such as zero‑second refresh, zero‑second order, and zero‑second add‑to‑cart are user behaviors that cannot be changed, but they can be protected by rate‑limiting or prompting.

2.2 Hotspot Data

Hotspot data handling follows three steps: identification, isolation, and optimization.

2.2.1 Hotspot Identification

Hotspot data can be static (predictable) or dynamic (unpredictable). Static hotspots are identified before a promotion by analyzing product attributes or seller registration. Dynamic hotspots arise from real‑time events such as live‑stream sales, causing sudden traffic spikes that can bypass cache and hit the database.

Typical identification workflow:

Asynchronously collect hotspot keys from Nginx logs or agent‑based hotspot logs.

Aggregate and analyze the data; once a rule is satisfied, publish the hotspot information to downstream systems for caching or rate‑limiting.

Best practices include asynchronous collection and near‑real‑time detection.

2.2.2 Hotspot Isolation

After identification, isolate hotspot traffic from the remaining 99%:

Business isolation: Separate hot‑sale participants via registration and pre‑warm caches.

System isolation: Deploy a dedicated cluster or domain for flash‑sale traffic.

Data isolation: Use a dedicated cache cluster or database shard for hotspot data.

2.2.3 Hotspot Optimization

Two main techniques are applied:

Cache hotspot data for a longer period when static‑dynamic separation is in place.

Rate‑limit hotspot requests to protect downstream services.

2.2.4 Summary

Hotspot optimization differs from static‑dynamic separation; it follows the 80/20 principle to handle a small fraction of data with targeted strategies, offering insights for other high‑performance distributed systems.

3. System Optimization

Performance can be improved through hardware upgrades, JVM tuning, and especially code‑level optimizations:

Reduce serialization: Minimize RPC calls by co‑locating tightly related services.

Directly output byte streams: Pre‑encode static strings and avoid costly character‑to‑byte conversions; avoid reflective toString implementations.

Trim log stack traces: Limit exception stack depth in high‑traffic environments.

Remove heavyweight frameworks: In extreme cases, replace MVC frameworks with raw Servlets to cut processing overhead.

4. Summary

Establish performance baselines (performance, cost, and link baselines) and continuously monitor them to drive incremental improvements at the code, business, and architecture layers.

Consistency

Inventory is the critical data in a flash‑sale. Accurate deduction is essential to avoid overselling.

1. Inventory Reduction Methods

Typical e‑commerce purchase flow consists of two steps: order placement and payment. Inventory can be reduced at different stages:

Reduce on order: Immediate deduction when the order is placed. Provides the most precise control.

Reduce on payment: Deduction occurs after payment, which can lead to orders that cannot be fulfilled if stock runs out.

Pre‑reserve (pre‑lock) inventory: Reserve stock for a limited time (e.g., 15 minutes) after order placement, releasing it if payment does not occur.

2. Problems with Inventory Reduction

2.1 Order‑time Reduction

Advantages: Best user experience and precise control via DB transactions.

Disadvantages: Vulnerable to malicious orders that reserve stock without paying, causing loss of sales.

2.2 Payment‑time Reduction

Advantages: Guarantees actual sales.

Disadvantages: Poor user experience; many orders may never pay, leading to apparent overselling.

2.3 Pre‑reserve

Advantages: Balances the two previous methods.

Disadvantages: Still susceptible to malicious orders after the reservation window.

3. Practical Implementation

Industry‑standard solutions often use pre‑reserve combined with anti‑fraud measures (e.g., marking frequent non‑paying users, limiting per‑user purchase quantity).

To prevent overselling, technical safeguards include:

Transactional checks that rollback if inventory would become negative.

Using unsigned integer columns to cause SQL errors on negative values.

SQL CASE WHEN logic, such as:

UPDATE item SET inventory = CASE WHEN inventory >= xxx THEN inventory-xxx ELSE inventory END

4. Consistency Performance Optimization

Inventory is both a hotspot and a high‑read/high‑write challenge.

4.1 High‑Concurrency Reads

Use layered validation: perform lightweight checks (eligibility, product status, request legality) during the read path, deferring strict consistency checks to the write path. This allows the use of distributed caches or local caches, tolerating some stale reads.

4.2 High‑Concurrency Writes

Two approaches:

Change the database choice: For simple inventory deduction, a persistent cache like Redis can be used.

Optimize the database: Reduce row‑level lock contention in MySQL, employ application‑level distributed locks, or use database‑level queuing patches (e.g., Alibaba's AliSQL) to serialize access to hot rows.

4.3 Summary

Read‑side optimizations have more headroom, while write‑side bottlenecks are bound by storage. Balancing CAP trade‑offs is essential.

5. Summary

Additional challenges include inventory rollback after pre‑reserve timeout and ensuring consistency between payment gateways and inventory updates.

High Availability

Flash‑sale traffic forms a sharp spike at a specific moment, creating a massive instantaneous load.

1. Traffic Shaping

Since the number of successful purchases is fixed, the system can limit the effective request volume. Techniques include:

Answer‑the‑question challenges to delay requests and filter bots.

Queueing mechanisms (message queues, thread‑pool locks, local memory buffering) to smooth bursts.

Filtering at multiple layers: read rate‑limiting, read caching, write rate‑limiting, and write validation.

1.1 Answer‑the‑Question

Adding a quiz before the final purchase step deters automated bots and spreads the request window from sub‑second to several seconds, reducing peak pressure on the backend.

1.2 Queueing

Common approaches include message queues, thread‑pool locking, local memory buffering, and file‑based serialization. Drawbacks are request backlog and degraded user experience due to out‑of‑order processing.

1.3 Filtering

Layered filtering removes invalid requests early, preserving I/O for genuine traffic.

1.4 Summary

Combining answer‑the‑question, queueing, and filtering balances commercial goals with architectural performance.

2. Plan B

When sustained high traffic overwhelms the system, a fallback Plan B is required. High‑availability is a lifecycle effort covering architecture, coding, testing, deployment, operation, and incident response.

Architecture stage: Design for scalability and fault tolerance, e.g., multi‑region deployment.

Coding stage: Implement robust code with proper timeouts and error handling.

Testing stage: Ensure CI coverage and static analysis quality.

Release stage: Use checklists, upstream/downstream notifications, and rollback mechanisms.

Operation stage: Real‑time monitoring, accurate alerting, and detailed diagnostics.

Incident stage: Rapid damage control, root‑cause analysis, and service restoration.

Operational measures include regular pressure testing, degradation/flow‑control/ circuit‑breaker protection, performance baselines, alert systems, and rapid recovery tools.

3. Summary

High availability is essentially stability; it is often deprioritized until a failure occurs. Organizational commitment—such as tying stability metrics to performance evaluations and forming dedicated reliability teams—helps embed reliability into the development lifecycle.

Personal Summary

A flash‑sale system can be built incrementally, from simple to complex architectures, based on traffic volume and business requirements. The key is to make trade‑offs consciously and keep the main design goal in focus.

Source: https://segmentfault.com/a/1190000020970562

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend High Availability System Design consistency high performance Flash Sale

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.