Designing a High‑Concurrency Flash Sale (Seckill) System: Architecture, Principles, and Optimization

This article explains how to design a flash‑sale (seckill) system that handles massive concurrent reads and writes by applying principles such as minimizing data and request volume, shortening request paths, eliminating single points of failure, and employing layered caching, traffic shaping, and robust high‑availability strategies.

Selected Java Interview Questions
Selected Java Interview Questions
Selected Java Interview Questions
Designing a High‑Concurrency Flash Sale (Seckill) System: Architecture, Principles, and Optimization

Overview

Concurrent Read/Write

Flash‑sale systems must solve the core problem of concurrent reads and writes, reducing the amount of data transferred and isolating write operations at the database level while providing fallback mechanisms for unexpected situations.

API Design Principles

To build a high‑traffic, high‑availability system, the request path from browser to server should follow these rules: keep user data minimal, minimize request count, keep the path short, reduce dependencies, and avoid single points of failure.

Flash‑Sale Architecture Principles

High Availability : The system must remain stable under both expected and peak traffic.

Consistency

Transaction volume must match the configured inventory.

Performance

Every component in the request chain should be optimized to be as fast as possible.

The article expands on these three principles.

Architecture Principles

A flash‑sale system is essentially a distributed system that must support high concurrency, high performance, and high availability.

Minimize Data

Both request payloads and responses should be as small as possible to reduce CPU overhead from serialization, compression, and network transfer.

Dependencies on other services and databases should also be minimized because each call adds serialization cost and latency.

Minimize Request Count

Combine static resources (CSS/JS) into single files and serve them via a URL that the server can dynamically merge, e.g.,

https://g.xxx.com/tm/xx-b/4.0.94/mods/??module-preview/index.xtpl.js,module-jhs/index.xtpl.js,module-focus/index.xtpl.js

.

Shorten Path

Reduce the number of intermediate nodes (proxies, sockets) in the request path; each additional node lowers overall availability and adds latency.

Reduce Dependencies

Classify dependencies as strong or weak; weak dependencies (e.g., coupons) can be dropped in extreme cases to protect core services.

Avoid Single Points

Make services stateless or externalize state to configuration centers; for stateful services like storage, use redundant replicas.

Different Scenarios and Architecture Cases

Start with a simple "timed‑release" page; as traffic grows from 10k/s to 100k/s, evolve the architecture by isolating the flash‑sale service, deploying dedicated clusters, caching hot data, and adding anti‑bot measures.

Further upgrades include full page static‑dynamic separation, local caching of product data, and rate‑limiting protection.

Static‑Dynamic Separation

What Is Static Data?

Static data does not depend on user‑specific factors (URL, cookies, location, etc.). It can be cached aggressively.

How to Cache Static Data

Closest to User

Cache in the browser, CDN, or server‑side cache.

Cache HTTP Connections Directly

Web proxies can return cached HTTP responses without re‑parsing headers.

Cache Language Considerations

Java is less efficient at handling massive connections; use Nginx/Varnish for static file delivery.

Static Data Handling Example

Use URL as cache key, separate user‑specific factors, time factors, and regional data, and strip cookies from cached responses.

Dynamic Data Handling

ESI (Edge Side Includes)

Insert dynamic fragments at the edge proxy.

CSI (Client Side Include)

Fetch dynamic fragments via asynchronous JavaScript.

Static‑Dynamic Separation Architecture

Dedicated Physical Machines

Deploy cache on physical servers for larger memory and higher hit rates, using consistent hashing to balance load.

Unified Cache Layer

Separate cache cluster shared by multiple services to reduce operational cost and improve maintainability.

CDN Deployment

Place cache close to users; address cache invalidation, hit‑rate, and release‑process challenges.

Hot‑Data Handling

What Is Hot Data?

Data that receives a disproportionate amount of traffic, such as popular products.

Discovering Hot Data

Static hot data can be identified via business rules or predictive analytics; dynamic hot data is collected in real‑time via agents that report hotspot keys from middleware.

Processing Hot Data

Cache hot items locally, limit access via hashing, and apply protection mechanisms (rate limiting, isolation, dedicated databases).

Traffic Shaping (Peak‑Cutting)

Flash‑sale traffic spikes require queuing, anti‑bot quizzes, and layered filtering to smooth request bursts.

Queueing

Use message queues or thread pools to buffer bursts.

Quiz Mechanism

Introduce a short quiz to deter automated bots and spread request timing.

Layered Filtering

Apply filters at CDN, front‑end, back‑end, and database layers to drop invalid requests early.

Performance Factors

Definition of Performance

Measured by QPS and response time; CPU execution time is the dominant factor.

Finding Bottlenecks

Monitor CPU, memory, disk, and network; use profiling tools (JProfiler, YourKit) or sampling via jstack.

System Optimizations (Java‑Specific)

Reduce Encoding

Write raw bytes via resp.getOutputStream() instead of character writers.

Reduce Serialization

Minimize RPC calls; co‑locate tightly coupled services.

Read‑Side Optimizations

Cache product details locally; treat read‑only fields as static and cache them for the duration of the sale.

Write‑Side Optimizations

Use database transactions, unsigned integer fields, or conditional updates such as:

UPDATE item SET inventory = CASE WHEN inventory >= xxx THEN inventory-xxx ELSE inventory END

Inventory Reduction Strategies

Order‑Time Reduction

Decrease inventory when an order is placed; guarantees consistency but may lock inventory for non‑paying users.

Payment‑Time Reduction

Decrease inventory at payment; risks overselling under high concurrency.

Pre‑Reservation

Reserve inventory for a limited time after order, then release if payment does not occur.

Handling Hot Items

Isolate hot products in separate databases or apply application‑level queuing to avoid row‑level lock contention.

Fallback Design

High‑Availability Construction

Consider architecture, coding, testing, release, operation, and failure stages to eliminate single points and ensure graceful degradation.

Degradation

Disable non‑core features when traffic exceeds thresholds.

Rate Limiting

Apply client‑side or server‑side limits based on QPS or thread count.

Reject‑All Service

When system load reaches critical levels, return HTTP 503 to protect the backend.

END

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaPerformance OptimizationSystem Designflash sale
Selected Java Interview Questions
Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.