Designing a High‑Concurrency Flash Sale (Seckill) System: Architecture, Principles, and Optimization
This article explains how to design a flash‑sale (seckill) system that handles massive concurrent reads and writes by applying principles such as minimizing data and request volume, shortening request paths, eliminating single points of failure, and employing layered caching, traffic shaping, and robust high‑availability strategies.
Overview
Concurrent Read/Write
Flash‑sale systems must solve the core problem of concurrent reads and writes, reducing the amount of data transferred and isolating write operations at the database level while providing fallback mechanisms for unexpected situations.
API Design Principles
To build a high‑traffic, high‑availability system, the request path from browser to server should follow these rules: keep user data minimal, minimize request count, keep the path short, reduce dependencies, and avoid single points of failure.
Flash‑Sale Architecture Principles
High Availability : The system must remain stable under both expected and peak traffic.
Consistency
Transaction volume must match the configured inventory.
Performance
Every component in the request chain should be optimized to be as fast as possible.
The article expands on these three principles.
Architecture Principles
A flash‑sale system is essentially a distributed system that must support high concurrency, high performance, and high availability.
Minimize Data
Both request payloads and responses should be as small as possible to reduce CPU overhead from serialization, compression, and network transfer.
Dependencies on other services and databases should also be minimized because each call adds serialization cost and latency.
Minimize Request Count
Combine static resources (CSS/JS) into single files and serve them via a URL that the server can dynamically merge, e.g.,
https://g.xxx.com/tm/xx-b/4.0.94/mods/??module-preview/index.xtpl.js,module-jhs/index.xtpl.js,module-focus/index.xtpl.js.
Shorten Path
Reduce the number of intermediate nodes (proxies, sockets) in the request path; each additional node lowers overall availability and adds latency.
Reduce Dependencies
Classify dependencies as strong or weak; weak dependencies (e.g., coupons) can be dropped in extreme cases to protect core services.
Avoid Single Points
Make services stateless or externalize state to configuration centers; for stateful services like storage, use redundant replicas.
Different Scenarios and Architecture Cases
Start with a simple "timed‑release" page; as traffic grows from 10k/s to 100k/s, evolve the architecture by isolating the flash‑sale service, deploying dedicated clusters, caching hot data, and adding anti‑bot measures.
Further upgrades include full page static‑dynamic separation, local caching of product data, and rate‑limiting protection.
Static‑Dynamic Separation
What Is Static Data?
Static data does not depend on user‑specific factors (URL, cookies, location, etc.). It can be cached aggressively.
How to Cache Static Data
Closest to User
Cache in the browser, CDN, or server‑side cache.
Cache HTTP Connections Directly
Web proxies can return cached HTTP responses without re‑parsing headers.
Cache Language Considerations
Java is less efficient at handling massive connections; use Nginx/Varnish for static file delivery.
Static Data Handling Example
Use URL as cache key, separate user‑specific factors, time factors, and regional data, and strip cookies from cached responses.
Dynamic Data Handling
ESI (Edge Side Includes)
Insert dynamic fragments at the edge proxy.
CSI (Client Side Include)
Fetch dynamic fragments via asynchronous JavaScript.
Static‑Dynamic Separation Architecture
Dedicated Physical Machines
Deploy cache on physical servers for larger memory and higher hit rates, using consistent hashing to balance load.
Unified Cache Layer
Separate cache cluster shared by multiple services to reduce operational cost and improve maintainability.
CDN Deployment
Place cache close to users; address cache invalidation, hit‑rate, and release‑process challenges.
Hot‑Data Handling
What Is Hot Data?
Data that receives a disproportionate amount of traffic, such as popular products.
Discovering Hot Data
Static hot data can be identified via business rules or predictive analytics; dynamic hot data is collected in real‑time via agents that report hotspot keys from middleware.
Processing Hot Data
Cache hot items locally, limit access via hashing, and apply protection mechanisms (rate limiting, isolation, dedicated databases).
Traffic Shaping (Peak‑Cutting)
Flash‑sale traffic spikes require queuing, anti‑bot quizzes, and layered filtering to smooth request bursts.
Queueing
Use message queues or thread pools to buffer bursts.
Quiz Mechanism
Introduce a short quiz to deter automated bots and spread request timing.
Layered Filtering
Apply filters at CDN, front‑end, back‑end, and database layers to drop invalid requests early.
Performance Factors
Definition of Performance
Measured by QPS and response time; CPU execution time is the dominant factor.
Finding Bottlenecks
Monitor CPU, memory, disk, and network; use profiling tools (JProfiler, YourKit) or sampling via jstack.
System Optimizations (Java‑Specific)
Reduce Encoding
Write raw bytes via resp.getOutputStream() instead of character writers.
Reduce Serialization
Minimize RPC calls; co‑locate tightly coupled services.
Read‑Side Optimizations
Cache product details locally; treat read‑only fields as static and cache them for the duration of the sale.
Write‑Side Optimizations
Use database transactions, unsigned integer fields, or conditional updates such as:
UPDATE item SET inventory = CASE WHEN inventory >= xxx THEN inventory-xxx ELSE inventory ENDInventory Reduction Strategies
Order‑Time Reduction
Decrease inventory when an order is placed; guarantees consistency but may lock inventory for non‑paying users.
Payment‑Time Reduction
Decrease inventory at payment; risks overselling under high concurrency.
Pre‑Reservation
Reserve inventory for a limited time after order, then release if payment does not occur.
Handling Hot Items
Isolate hot products in separate databases or apply application‑level queuing to avoid row‑level lock contention.
Fallback Design
High‑Availability Construction
Consider architecture, coding, testing, release, operation, and failure stages to eliminate single points and ensure graceful degradation.
Degradation
Disable non‑core features when traffic exceeds thresholds.
Rate Limiting
Apply client‑side or server‑side limits based on QPS or thread count.
Reject‑All Service
When system load reaches critical levels, return HTTP 503 to protect the backend.
END
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
