Backend Development 8 min read

Designing a Scalable Flash‑Sale System for Millions of Users

This article explains how to handle massive flash‑sale traffic by combining CDN edge services, distributed rate‑limiting, load‑balancing, authentication, caching, and task queues to ensure performance, prevent overselling, and maintain data consistency under extreme load.

ITFLY8 Architecture Home

Nov 11, 2021

Designing a Scalable Flash‑Sale System for Millions of Users

Flash‑Sale Scenario

In a flash‑sale, a limited number of items (e.g., ten bottles of a premium liquor) are offered to a massive audience, often millions of users, who must not be allowed to purchase more than the available stock. A countdown timer synchronizes the start of the sale.

Key Challenges

1. Synchronizing client clocks so all users see the same countdown.

2. Preventing bots and scalpers from grabbing the items.

3. Ensuring the backend can sustain the sudden surge of traffic.

Design Approach

Simply scaling up servers is impractical because the required hardware would be prohibitively expensive. Instead, a distributed architecture is needed.

Technical Solution

Option 1: CDN Edge Services

Deploy small services on CDN edge nodes to serve static assets and handle initial user requests. These edge services track the number of online users and periodically report to a central data center. When the sale starts, the data center sends a probability value to each edge node, which uses it to decide whether to forward a request to the backend or reject it as “sale ended”.

Option 2: Layered Filtering and Rate Limiting

Use CDN for static content, then apply authentication to filter out bots and unauthenticated users. After authentication, distribute requests with LVS + Keepalived to an Nginx cluster, followed by a gateway cluster that enforces rate limiting. If traffic still threatens the database, apply service‑level throttling, degradation, and cache hot data. Orders are placed into a task queue for asynchronous processing, ensuring database consistency and handling payment timeouts.

Summary

Using CDN edge nodes to absorb traffic and applying multi‑layer request filtering enables a flash‑sale to run smoothly without overselling. For larger events like Double‑11, a more comprehensive high‑concurrency architecture, thorough performance testing, and horizontal scaling are required. Edge computing can also reduce latency and cost for region‑specific services such as food delivery or ride‑hailing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

load balancing CDN high concurrency rate limiting flash sale

Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.