Backend Development 15 min read

Traffic Peak Shaving Techniques for Flash Sale Systems: Queuing, Quiz, and Layered Filtering

The article explains why flash‑sale services need to shave traffic peaks and presents three practical, loss‑less techniques—using message queues to buffer bursts, adding a quiz step to delay and filter requests, and applying layered filtering across CDN, front‑end, back‑end and database—to achieve smoother processing and lower resource waste.

Architect's Tech Stack
Architect's Tech Stack
Architect's Tech Stack
Traffic Peak Shaving Techniques for Flash Sale Systems: Queuing, Quiz, and Layered Filtering

When you look at a flash‑sale traffic‑monitoring chart, the request volume appears as a straight line that spikes sharply at the exact second the sale starts, because all requests are highly concentrated in time, creating an instantaneous high‑traffic peak that exhausts resources.

However, the number of successful purchases is fixed, so whether 100 or 10,000 users attempt to buy, the result is the same; higher concurrency merely generates more invalid requests.

From a business perspective, a flash‑sale wants many users to browse before the start, but the actual ordering requests should not be excessive. Therefore, rules can be designed to delay concurrent requests and even filter out invalid ones.

Why Shave Peaks

Why shave peaks? What problems do spikes cause?

Server processing capacity is constant; when a spike occurs the server becomes busy and may be unable to handle all requests, while during idle periods resources are under‑utilized. Estimating capacity based on peak usage leads to waste, similar to traffic‑peak restrictions in cities.

Peak shaving makes server processing smoother and saves resource costs.

In the flash‑sale scenario, peak shaving essentially delays user requests to reduce and filter out invalid ones, following the principle of “as few requests as possible”.

Today I will introduce three loss‑less peak‑shaving approaches: queuing, quiz, and layered filtering.

Queuing

The simplest way to shave peaks is to use a message queue to buffer instantaneous traffic, turning synchronous calls into asynchronous pushes. The queue acts like a reservoir, storing upstream bursts and releasing them downstream smoothly.

Diagram of the queuing solution:

Using a message queue to buffer bursts

If the peak lasts long enough to exceed the queue’s processing limit (e.g., storage capacity), the queue itself can become a bottleneck, similar to a reservoir that cannot contain a flood.

Other queuing‑style methods include:

1. Thread‑pool locking and waiting. 2. In‑memory FIFO/LIFO algorithms. 3. Serializing requests to files and replaying them (e.g., MySQL binlog sync).

All these approaches turn a single‑step operation into a two‑step one, inserting a buffering step.

Quiz

Why add a quiz step?

Early flash‑sales were pure page refreshes and button clicks. Adding a quiz serves two purposes: it prevents bots (flash‑sale scripts) from cheating, and it deliberately delays the request, spreading the order‑submission window from 1 s to 2‑10 s, thus reducing the instantaneous peak.

Quiz page

The quiz design consists of three modules:

1. Question‑bank generation – creates simple Q&A pairs that are hard for machines to solve. 2. Question‑bank push – delivers unique questions to the detail and transaction systems before the sale starts. 3. Image generation – renders the question as an image with visual noise, pushes it to CDN and pre‑warms it to avoid latency.

When a user submits an answer, the system compares it with the stored answer; if correct, the order flow continues, otherwise it fails.

Keys are built as follows (MD5‑hashed): Question key: userId+itemId+questionId+time+PK Answer key: userId+itemId+answer+PK

Quiz validation logic

Besides answer verification, the system also checks user login status, cookie completeness, request frequency, etc., and may enforce a minimum answer‑submission time (e.g., >1 s) to further block automated scripts.

Layered Filtering

Another approach is to filter requests at multiple layers, forming a funnel that discards invalid traffic early.

Layered filtering steps:

1. CDN and browser cache intercept most data reads. 2. Front‑end system serves data from cache whenever possible, filtering out invalid reads. 3. Back‑end system performs secondary validation and rate‑limiting, further reducing traffic. 4. Database layer enforces strong consistency checks.

The core idea is to discard as many invalid requests as possible at each layer, letting only truly valid requests reach the final stage.

Basic principles of layered verification:

1. Cache dynamic reads on the web side. 2. Skip strong consistency checks for reads to avoid bottlenecks. 3. Time‑slice writes to filter expired requests. 4. Apply rate‑limit protection on writes. 5. Perform strong consistency checks on writes (e.g., inventory cannot go negative).

The goal is to reduce read‑side bottlenecks while ensuring write‑side correctness.

Summary

In this article I introduced three peak‑shaving techniques for high‑traffic scenarios:

1. Queue buffering – controls request emission and is suitable for internal service‑to‑service calls. 2. Quiz – adds a human‑solvable step to delay and filter requests, ideal for flash‑sale or marketing events. 3. Layered filtering – progressively discards invalid traffic across CDN, front‑end, back‑end, and database layers.

Queue buffering is the most generic solution, while quiz and layered filtering complement each other to further reduce invalid traffic and protect system resources.

backendflash salequeuetraffic shapingQuizlayered filtering
Architect's Tech Stack
Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.