Backend Development 37 min read

How to Build a Scalable Flash‑Sale System: Architecture, Challenges & Solutions

This article dissects the technical challenges of high‑traffic flash‑sale (seckill) systems, outlining typical e‑commerce flow, unique seckill characteristics, and detailed solutions for isolation, load handling, bandwidth, request throttling, order processing, database design, caching, concurrency control, and anti‑cheat mechanisms.

Big Data and Microservices

May 8, 2016

How to Build a Scalable Flash‑Sale System: Architecture, Challenges & Solutions

1. Flash‑sale (seckill) business characteristics

A flash‑sale is a short‑lived promotion where a single product is offered at a very low price. Typical traits are:

Scheduled start time and a very short selling window.

Massive concurrent traffic (tens of thousands of users may refresh the page every second).

Only one successful order is expected; the rest must be rejected quickly.

2. Core technical challenges and practical solutions

Impact on existing services – Co‑hosting the flash‑sale with the normal site can exhaust shared resources and cause a full outage. Solution: Deploy the flash‑sale as an isolated system, optionally on a separate domain, and route traffic through a dedicated load balancer.

High read traffic before the sale starts – Users continuously refresh the product page, creating a huge number of read requests that would hit the application servers and database. Solution: Serve a fully static HTML page (HTML, CSS, JS, images) from a CDN; the page contains no dynamic server‑side rendering.

Sudden bandwidth demand – Example: a 200 KB page accessed by 10 000 concurrent users requires ~2 GB/s of outbound traffic. Solution: Lease additional ISP bandwidth for the event and cache the static page on a CDN to off‑load the origin.

Exposure of the order URL – If the order endpoint is known before the start time, users can bypass the timing control. Solution: Append a server‑generated random token to the order URL; the token is only revealed when the sale begins.

Enabling the “Buy” button at the exact start time – Because the page is cached, the client cannot rely on server‑side rendering. Solution: Include a tiny JavaScript file that holds a saleStarted flag. When the sale starts, replace the file (same name, different content) with the flag set to true and the dynamic order URL. Use a version query string (e.g., sale.js?v=12345) to force browsers and CDN nodes to fetch the new file.

Ensuring only the first successful request proceeds – Only one user should be allowed to place an order. Solution: Limit each order‑processing node to a small concurrency (e.g., 10 simultaneous requests). Use a Redis‑based token or a signed cookie to enforce the limit and employ a least‑connections load‑balancing algorithm.

Pre‑order validation – Before forwarding a request to the order subsystem, verify:

Total number of submitted orders does not exceed the product stock.

Per‑node request count is within the configured cap.

Stock is still available.

If any check fails, return a “sale ended” response immediately.

Scheduled launch handling – Hide the “Buy Now” button until the configured start timestamp and protect the URL with a server‑generated random parameter.

Inventory deduction strategy – Two common approaches: deduct inventory at order placement ("拍下减库存") or at payment ("付款减库存"). For flash‑sales the former provides a better user experience because the stock is reserved instantly.

Overselling prevention – Use optimistic locking when updating the inventory table:

UPDATE auction_auctions
SET quantity = :inQuantity
WHERE auction_id = :itemId AND quantity = :dbQuantity;

If the WHERE clause does not match, the update fails and the request is rejected.

Anti‑cheat mechanisms – Deploy verification codes (e.g., TV‑broadcast codes, answer‑based challenges) or custom captchas to filter automated scripts.

3. Architecture principles

Intercept traffic upstream – Route the majority of requests away from the backend data layer (e.g., via Nginx, CDN, or a dedicated gateway) to avoid overwhelming the database.

Read‑heavy / write‑light workload – Cache aggressively because reads dominate (≈99.9 % reads, 0.1 % writes).

4. Detailed architecture design

4.1 Front‑end layer

The static product page displays a countdown timer. All static assets (HTML, CSS, JS, images) are stored separately and served from CDN edge nodes. Time synchronization is performed by a lightweight endpoint that returns the current server time in JSON; the client adjusts its clock accordingly. A small JavaScript throttle disables repeated clicks and limits the request rate per user.

4.2 Site layer (edge caching)

Requests that contain the same user identifier (UID) or the same item identifier are cached for a few seconds, effectively filtering out the majority of traffic before it reaches the service layer.

4.3 Service layer

The service layer is split into four logical modules:

User request distribution – Nginx/Apache load balancers distribute incoming HTTP requests across multiple front‑end machines.

User request pre‑processing – Checks stock availability; if the product is sold out, returns a failure response immediately.

package seckill;
import org.apache.http.HttpRequest;
/** Pre‑process stage: reject unnecessary requests, queue needed ones. */
public class PreProcessor {
    private static boolean stockAvailable = true;
    private static void reject() { /* send failure response */ }
    public static boolean checkStock() {
        if (stockAvailable) {
            // Remote RPC to verify remaining stock
            if (!RPC.checkStock()) {
                stockAvailable = false;
            }
        }
        return stockAvailable;
    }
    public static void preProcess(HttpRequest request) {
        if (checkStock()) {
            RequestQueue.queue.add(request);
        } else {
            reject();
        }
    }
}

User request processing – Dequeues a request, creates a BidInfo object and forwards it to the database module.

package seckill;
import org.apache.http.HttpRequest;
public class Processor {
    public static void kill(BidInfo info) { DB.bids.add(info); }
    public static void process() {
        BidInfo info = new BidInfo(RequestQueue.queue.poll());
        if (info != null) { kill(info); }
    }
}
class BidInfo {
    BidInfo(HttpRequest request) { /* extract user/item info */ }
}

Database module – Holds a bounded ArrayBlockingQueue of potential successful bids and performs optimistic‑lock updates.

package seckill;
import java.util.concurrent.ArrayBlockingQueue;
public class DB {
    public static final int MAX_BIDS = 10;
    public static ArrayBlockingQueue<BidInfo> bids = new ArrayBlockingQueue<>(MAX_BIDS);
    public static boolean checkStock() { return true; }
    public static void persist() {
        BidInfo info = bids.poll();
        while (info != null) {
            // INSERT INTO bids ...
            info = bids.poll();
        }
    }
}

4.4 Database design

Key concepts used in production:

Single database instance (单库)

Sharding (分片) – Horizontal partitioning with routing strategies such as range, hash, or a dedicated router service.

Grouping / replication (分组) – Master‑slave (primary‑replica) configuration for high availability.

Typical deployments combine sharding and replication to achieve both scalability and reliability.

5. Massive concurrency considerations

5.1 Interface design

The static HTML is served via CDN; the bottleneck is the backend API that must respond within a few milliseconds. In‑memory stores (Redis) are used for stock checks and token generation, while writes are performed asynchronously.

5.2 Performance metrics

Throughput is measured in QPS (queries per second). A theoretical peak can be estimated as:

QPS = (numServers * maxClientsPerServer) / avgResponseTime
// Example: 20 servers * 500 clients / 0.1 s = 100 000 QPS

In practice, CPU context switches, network latency, and lock contention reduce the achievable QPS.

5.3 Restart and overload protection

When a “snowball” effect is detected (traffic surge causing resource exhaustion), reject new traffic at the entry point (load balancer or CDN edge) before restarting services. Pre‑warm caches (e.g., warm Redis with stock keys) before bringing the system back online.

6. Cheating tactics and defenses

6.1 Single account, multiple rapid requests

Limit each account to a single active request using a Redis key with WATCH (optimistic lock). Subsequent requests are discarded.

6.2 Multiple accounts, high request rate per IP

Detect abnormal request rates per IP and either present a captcha or block the IP temporarily.

6.3 Rotating IPs and zombie accounts

When IP‑based detection fails, raise participation thresholds (e.g., require a minimum account level) or apply behavior‑based data‑mining to identify and filter out automated accounts.

7. Data safety under high concurrency

7.1 Overselling causes

Concurrent reads may all see the same remaining stock value, leading to multiple successful deductions.

7.2 Pessimistic locking

Serializes updates but dramatically increases response time and can cause request starvation.

7.3 FIFO queue approach

Queues incoming requests, but the queue can grow beyond memory limits under extreme load.

7.4 Optimistic locking (recommended)

Use a version column or Redis WATCH to allow concurrent attempts; only the transaction that matches the current version succeeds, providing a good balance of performance and safety.

8. Summary

Flash‑sale systems exemplify high‑concurrency e‑commerce scenarios. The essential techniques are:

Isolate traffic with a dedicated domain and upstream filtering.

Serve static pages via CDN and use a tiny JavaScript flag to control the sale start.

Cache aggressively; keep the backend read‑heavy and write‑light.

Employ Redis for fast stock checks and token generation.

Control concurrency with per‑node request limits, optimistic locking, and a bounded in‑memory queue.

Detect and mitigate cheating through per‑account limits, IP rate limiting, captchas, and behavior analysis.

Design the database with sharding and replication, and use a “master‑only write, replica‑only read” model or a shadow‑master setup for high availability.

These patterns are widely applicable to any scenario that requires millisecond‑level response times under tens of thousands of simultaneous users.

backend System Architecture database High Concurrency Flash Sale

Written by

Big Data and Microservices

Focused on big data architecture, AI applications, and cloud‑native microservice practices, we dissect the business logic and implementation paths behind cutting‑edge technologies. No obscure theory—only battle‑tested methodologies: from data platform construction to AI engineering deployment, and from distributed system design to enterprise digital transformation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.