Backend Development 37 min read

Design and Architecture of High‑Concurrency Flash‑Sale (Seckill) Systems

This article presents a comprehensive analysis of flash‑sale (seckill) business characteristics, technical challenges such as traffic spikes, database overload, and order handling, and offers detailed architectural solutions spanning frontend static pages, CDN caching, request throttling, queue‑based processing, optimistic locking, database sharding, high‑availability designs, and anti‑cheat mechanisms to ensure correctness and scalability under extreme concurrency.

Top Architect
Top Architect
Top Architect
Design and Architecture of High‑Concurrency Flash‑Sale (Seckill) Systems

1. Flash‑Sale Business Analysis

Normal e‑commerce flow:

(1) Query product;
(2) Create order;
(3) Decrease inventory;
(4) Update order;
(5) Payment;
(6) Seller ships

Flash‑sale specific traits:

(1) Low price;
(2) Massive promotion;
(3) Instant sell‑out;
(4) Timed release;
(5) Short duration, extremely high concurrency

2. Technical Challenges

Assuming a single product attracts 10,000 users, the system must handle at least 10,000 concurrent requests.

Impact on existing services : Co‑location with normal traffic can cause a complete site outage; solution – deploy the flash‑sale system independently, optionally using a separate domain.

Application and database load under high concurrency : Continuous page refreshes generate massive requests to the app server and DB; solution – static‑ize the product page so requests bypass the application layer.

Network and bandwidth surge : 200 KB page × 10,000 requests = 2 GB bandwidth; solution – lease additional bandwidth and cache the page in CDN.

Direct order URL exposure : Prevent users from accessing the order URL before the sale starts; solution – make the URL dynamic with a server‑generated random parameter.

Button activation control : The purchase button should be gray before the sale and become active at the start; solution – control the flag via a small JavaScript file that is refreshed only when the sale begins, using versioned URLs to avoid CDN caching.

Only the first successful order should be sent to the order subsystem : Limit each server to a small number of concurrent order requests and use cookies or a minimal‑connection load‑balancing algorithm to reduce overload.

Pre‑order checks : Limit each server to 10 pending orders; if exceeded, return a “sale ended” page. Also check the global submitted order count before forwarding to the sub‑order system.

Timed product release : Publish the product ahead of time but keep the “Buy Now” button disabled; protect the URL with a server‑generated random token.

Inventory reduction strategies : Choose between “decrease on reservation” or “decrease on payment”; the article prefers the former for better user experience.

Overselling risk : Concurrent inventory updates can cause sales beyond stock; solution – use optimistic locking.

Bot mitigation : Use special verification codes, TV‑broadcasted codes, or answer‑based challenges.

3. Architecture Principles

Intercept requests as early as possible (upstream) to avoid overwhelming the backend data layer.

Read‑heavy, write‑light workloads benefit greatly from caching.

4. Architecture Design

4.1 Frontend Layer

Display a static flash‑sale page with a countdown timer. Static assets (HTML, CSS, JS, images) should be stored separately and served via CDN to offload bandwidth.

Client‑side countdown may drift from server time; synchronize periodically with a lightweight time‑sync endpoint.

4.2 Site Layer

Limit requests per UID or per item within a short time window using Nginx/Apache rules.

4.3 Service Layer

All business logic is hidden behind a service layer that shields the underlying DB and cache.

Request flow:

User request distribution (Nginx/Apache)

Pre‑processing (check remaining stock)

Processing (wrap request into a transaction and forward to DB)

Database interface (single RPC entry point)

Sample Java pre‑processor:

package seckill;
import org.apache.http.HttpRequest;
/**
 * Pre‑process stage – reject unnecessary requests, queue necessary ones.
 */
public class PreProcessor {
    private static boolean reminds = true;
    private static void forbidden() { /* Do something */ }
    public static boolean checkReminds() {
        if (reminds) {
            if (!RPC.checkReminds()) {
                reminds = false;
            }
        }
        return reminds;
    }
    public static void preProcess(HttpRequest request) {
        if (checkReminds()) {
            RequestQueue.queue.add(request);
        } else {
            forbidden();
        }
    }
}

Concurrent queue implementation:

package seckill;
import java.util.concurrent.ConcurrentLinkedQueue;
import org.apache.http.HttpRequest;
public class RequestQueue {
    public static ConcurrentLinkedQueue
queue = new ConcurrentLinkedQueue<>();
}

Processor that moves requests from the queue to the DB:

package seckill;
import org.apache.http.HttpRequest;
public class Processor {
    public static void kill(BidInfo info) { DB.bids.add(info); }
    public static void process() {
        BidInfo info = new BidInfo(RequestQueue.queue.poll());
        if (info != null) { kill(info); }
    }
}
class BidInfo { BidInfo(HttpRequest request) { /* Do something */ } }

Database module using a bounded blocking queue:

package seckill;
public class DB {
    public static int count = 10;
    public static java.util.concurrent.ArrayBlockingQueue
bids = new java.util.concurrent.ArrayBlockingQueue<>(10);
    public static boolean checkReminds() { return true; }
    public static void bid() {
        BidInfo info = bids.poll();
        while (count-- > 0) {
            // insert into Bids table
            info = bids.poll();
        }
    }
}

4.4 Database Design

Concepts:

Single‑node (single‑database) deployment.

Sharding (horizontal partitioning) to handle large data volumes.

Grouping (master‑slave replication) for high availability.

Routing strategies: range, hash, or a dedicated router service.

High‑availability solutions:

Read‑only replicas for read‑heavy workloads.

Dual‑master with one active master and one shadow‑master for failover.

Cache layer (Redis/Memcached) to offload reads.

Consistency handling:

Middleware that routes reads to the master after a write.

Force‑read‑master to avoid stale reads.

Cache double‑eviction strategy to reduce stale data.

5. Challenges of Massive Concurrency

5.1 Interface Design

Separate static HTML (served by CDN) from the high‑traffic backend API. Use in‑memory stores (Redis) for ultra‑fast operations and asynchronous writes to persistent storage.

5.2 Performance Bottlenecks

QPS calculations show that increased response time dramatically reduces effective throughput; proper sizing of web server workers, connection limits, and hardware resources is essential.

5.3 Overload Protection

When the system reaches overload, reject new requests at the entry layer, pre‑heat dependent services (e.g., Redis) before restart, and use circuit‑breaker patterns.

6. Anti‑Cheat Measures

6.1 Single‑Account Flooding

Limit each account to one concurrent request using Redis with WATCH/optimistic lock.

6.2 Zombie‑Account Mass Requests

Detect high request rates per IP and either present a captcha or block the IP.

6.3 Distributed IP Botnets

When IP‑based detection fails, raise participation thresholds (e.g., account level) and employ behavioral data mining to filter out suspicious accounts.

7. Data Safety Under High Load

7.1 Over‑selling Causes

Concurrent reads of remaining stock can all see the same value and all succeed, leading to overselling.

7.2 Pessimistic Locking

Locks serialize updates but increase latency and can cause request starvation.

7.3 FIFO Queue

Queueing requests ensures order but may exhaust memory under extreme load.

7.4 Optimistic Locking

Use version numbers or Redis WATCH to allow concurrent attempts while only committing the first successful update.

8. Summary

Flash‑sale and flash‑purchase scenarios are classic high‑concurrency problems; despite varied implementations, the core challenges—traffic spikes, database contention, consistency, and cheat prevention—are common, and the solutions presented (early request interception, caching, queueing, optimistic locking, dual‑master HA, and anti‑bot tactics) provide a reusable toolbox for building robust, scalable systems.

backendSystem Architecturedatabasecachinghigh concurrencyflash sale
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.