Backend Development 46 min read

How to Build a High‑Performance Flash‑Sale System: Architecture, Caching, and Scaling Strategies

This article explains how to design a high‑concurrency flash‑sale (秒杀) system by optimizing concurrent reads and writes, applying API and architecture principles, separating dynamic and static data, handling hot items, shaping traffic, improving performance, managing inventory reduction, and implementing robust fallback mechanisms for high availability.

MaGe Linux Operations

Jul 1, 2022

How to Build a High‑Performance Flash‑Sale System: Architecture, Caching, and Scaling Strategies

1. Overview

1.1 Concurrent Read/Write

The main challenge of flash‑sale systems is concurrent reads and writes. Optimizing reads means reducing the amount of data a user must fetch from the server; optimizing writes means isolating a dedicated database for special handling. Protection mechanisms and fallback plans are also required.

1.2 API Design Principles

For ultra‑high‑traffic, high‑performance, highly‑available systems, the user request path should follow four rules: as little data as possible, as few requests as possible, as short a path as possible, and as few dependencies as possible, with no single points of failure.

1.3 Flash‑Sale Architecture Principles

High Availability : The architecture must remain stable both under expected load and traffic spikes.

Consistency

Data must be consistent; the total number of successful transactions must match the configured quantity.

Performance

The system must sustain massive traffic by optimizing every link in the request chain.

2. Architecture Principles

2.1 Minimize Data

Both request payloads and responses should be as small as possible to reduce network latency and CPU usage for compression and encoding. System‑level data dependencies should also be minimized to avoid excessive database interaction.

2.2 Minimize Request Count

Extra requests such as CSS, JavaScript, images, and Ajax calls should be reduced. For example, merge multiple JS files into a single request using a URL that the server resolves into combined content.

2.3 Shorten Path

Each intermediate node (proxy, additional socket, etc.) adds latency and reduces overall availability. Shorter paths improve both reliability and performance.

2.4 Reduce Dependencies

Classify system components into levels (0‑level, 1‑level, …). Critical services (e.g., payment) should have minimal strong dependencies on lower‑level services to avoid cascading failures.

2.5 Eliminate Single Points

Stateless services and dynamic configuration (via a config center) remove machine‑specific bindings, while data that must be persisted is replicated to avoid single‑point failures.

3. Architecture Cases for Different Scenarios

A simple implementation adds a “timed‑release” button to the product page. As traffic grows from 10k/s to 100k/s, the architecture evolves:

Separate the flash‑sale system into its own service.

Deploy an independent machine cluster for flash‑sale traffic.

Cache hot data (e.g., inventory) in a dedicated cache.

Add a quiz to deter automated bots.

Further scaling includes full static‑dynamic separation, local caching of product details, and adding rate‑limiting protection.

4. Dynamic/Static Separation Solution

4.1 What Is Dynamic vs. Static Data

Static data does not depend on URL, user, time, region, or cookies; dynamic data does. Static data can be aggressively cached.

4.2 Caching Static Data

4.2.1 Nearest to the User

Cache in the browser, CDN, or server‑side cache.

4.2.2 Cache the HTTP Connection Directly

Web proxies can return the stored HTTP response (headers + body) without re‑parsing the protocol.

4.2.3 Language‑Specific Cache Choices

Because Java is not efficient at handling massive connections, static caching is often performed at the web‑server layer (Nginx, Apache, Varnish) rather than inside Java.

4.3 Static Data Handling

URL uniquification – use the URL as the cache key.

Separate user‑related factors (login status, identity).

Separate time‑related factors.

Asynchronously fetch region‑specific data.

Strip cookies from cached responses (e.g., Varnish unset req.http.cookie).

4.4 Dynamic Data Handling

4.4.1 ESI (Edge Side Includes)

Insert dynamic fragments into a cached static page at the edge proxy.

4.4.2 CSI (Client‑Side Include)

Fetch dynamic fragments via asynchronous JavaScript requests.

4.5 Full Dynamic/Static Separation Architecture

4.5.1 Single‑Machine Deployment

Deploy Nginx + Cache + Java on a physical server, using consistent‑hash groups to balance cache hit rate and avoid hot‑spot overload.

4.5.2 Unified Cache Layer

Separate cache into its own cluster, reducing operational cost and enabling shared memory across services.

4.5.3 CDN Front‑End

Push the cache further to a CDN; use a small number of second‑level CDN caches to keep hit rates high while serving users close to the edge.

5. Hot Data Handling

5.1 What Is Hot Data

Hot data are items that receive massive read/write traffic. It can be static (predictable) or dynamic (unpredictable).

5.2 Discovering Hot Data

5.2.1 Static Hot Data

Identify hot items via business rules (e.g., sellers register for promotions) or by calculating top‑N products from traffic logs.

5.2.2 Dynamic Hot Data

Build an asynchronous pipeline that collects hotspot keys from middleware (Nginx, cache, RPC) and publishes them to downstream services for protection.

5.3 Processing Hot Data

Optimization : Cache hot items; static hot data can be cached long‑term.

Limiting : Use consistent‑hash bucket queues to throttle hot‑item requests.

Isolation : Separate hot‑item processing at business, system, and data layers.

6. Traffic Shaping (Peak Cutting)

6.1 Why Cut Peaks

To keep server resources from being overwhelmed during the flash‑sale burst.

6.2 Lossless Peak‑Cutting Methods

6.2.1 Queuing

Buffer spikes with a message queue, converting synchronous calls into asynchronous pushes.

6.2.2 Quiz

Introduce a short quiz to deter bots and to artificially delay requests, spreading the load over a longer time window.

6.2.3 Layered Filtering

Apply a funnel‑style filter across CDN, front‑end, back‑end, and database layers to drop invalid requests early.

7. Factors Influencing Performance

7.1 Definition of Performance

Measured by QPS and response time (RT). Shorter RT yields higher QPS; in multi‑threaded environments, QPS = (1000 ms / RT) × thread count.

7.2 Finding Bottlenecks

CPU is the primary bottleneck for flash‑sale systems. Use profilers (JProfiler, YourKit) or periodic jstack sampling to locate hot functions. If CPU usage stays below ~95 % at peak QPS, other resources may be limiting.

7.3 System Optimizations (Java‑Specific)

7.3.1 Reduce Encoding

Avoid unnecessary character‑to‑byte conversions; stream static data directly via resp.getOutputStream().

7.3.2 Reduce Serialization

Minimize RPC calls; merge tightly related services into a single deployment to avoid serialization overhead.

7.3.3 Java‑Specific Flash‑Sale Optimizations

Use plain Servlets instead of heavyweight MVC frameworks.

Write output directly with resp.getOutputStream() and prefer JSON over template rendering.

7.3.4 Concurrent Read Optimization

Cache product titles and descriptions locally on each flash‑sale machine; cache inventory with short‑lived passive expiration.

7.3.5 Reducing Serialization in RPC

Deploy related services on the same JVM to bypass network serialization.

8. Inventory Reduction Logic

8.1 Reduction Methods

Order‑time reduction : Decrease inventory when an order is placed (precise but vulnerable to fake orders).

Payment‑time reduction : Decrease inventory after payment (prevents fake orders but may cause oversell).

Pre‑deduction : Reserve inventory for a limited time after ordering; release if payment does not occur.

8.2 Problems and Mitigations

Combine strategies, add anti‑cheat measures (user tagging, purchase limits, rate limiting) to handle malicious orders.

8.3 Large‑Scale Flash‑Sale Inventory

Use SQL like the following to ensure inventory never goes negative:

UPDATE item SET inventory = CASE WHEN inventory >= :qty THEN inventory-:qty ELSE inventory END

For simple cases, inventory can be decremented directly in a cache (e.g., Redis). For complex cases, keep the operation in the database to leverage transactions and row‑level locking.

9. Designing Fallback Plans

9.1 High‑Availability Foundations

Architecture: Multi‑datacenter, eliminate single points.

Code: Robust error handling, timeouts, default fallbacks.

Testing: Comprehensive test cases covering worst‑case scenarios.

Release: Quick rollback mechanisms.

Operation: Accurate monitoring and alerting.

Failure: Immediate damage control (e.g., take down erroneous product).

9.2 Degradation

When traffic exceeds a threshold, disable non‑core features (e.g., reduce displayed transaction records) via feature‑switches.

9.3 Rate Limiting

Apply client‑side or server‑side limits based on QPS or thread count; use token‑bucket or leaky‑bucket algorithms to protect the system.

9.4 Reject‑Service

When CPU or load crosses critical values, return HTTP 503 immediately to prevent total collapse; the system can recover automatically when load drops.

END

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

system architecture Caching High concurrency inventory management Flash Sale traffic-shaping

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.