Designing a Scalable Flash‑Sale System for Millions of Users
This article explains how to handle massive flash‑sale traffic by combining CDN edge services, distributed rate‑limiting, load‑balancing, authentication, caching, and task queues to ensure performance, prevent overselling, and maintain data consistency under extreme load.
Flash‑Sale Scenario
In a flash‑sale, a limited number of items (e.g., ten bottles of a premium liquor) are offered to a massive audience, often millions of users, who must not be allowed to purchase more than the available stock. A countdown timer synchronizes the start of the sale.
Key Challenges
1. Synchronizing client clocks so all users see the same countdown.
2. Preventing bots and scalpers from grabbing the items.
3. Ensuring the backend can sustain the sudden surge of traffic.
Design Approach
Simply scaling up servers is impractical because the required hardware would be prohibitively expensive. Instead, a distributed architecture is needed.
Technical Solution
Option 1: CDN Edge Services
Deploy small services on CDN edge nodes to serve static assets and handle initial user requests. These edge services track the number of online users and periodically report to a central data center. When the sale starts, the data center sends a probability value to each edge node, which uses it to decide whether to forward a request to the backend or reject it as “sale ended”.
Option 2: Layered Filtering and Rate Limiting
Use CDN for static content, then apply authentication to filter out bots and unauthenticated users. After authentication, distribute requests with LVS + Keepalived to an Nginx cluster, followed by a gateway cluster that enforces rate limiting. If traffic still threatens the database, apply service‑level throttling, degradation, and cache hot data. Orders are placed into a task queue for asynchronous processing, ensuring database consistency and handling payment timeouts.
Summary
Using CDN edge nodes to absorb traffic and applying multi‑layer request filtering enables a flash‑sale to run smoothly without overselling. For larger events like Double‑11, a more comprehensive high‑concurrency architecture, thorough performance testing, and horizontal scaling are required. Edge computing can also reduce latency and cost for region‑specific services such as food delivery or ride‑hailing.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
