Designing Ultra‑Fast High‑Concurrency Systems: Principles and a 60k QPS Flash‑Sale Case Study

This article outlines core principles for building high‑concurrency back‑end systems—doing less and doing it cleverly—then demonstrates their application in a real‑world flash‑sale (秒杀) scenario that handled 60,000 queries per second through careful feature selection, data reduction, caching strategies, queue control, and asynchronous processing.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Designing Ultra‑Fast High‑Concurrency Systems: Principles and a 60k QPS Flash‑Sale Case Study

A large‑scale web application typically evolves from a small site or single‑machine app; to support massive traffic, front‑end and back‑end adopt techniques such as static asset compression, CDNs, SOA, caching, indexed databases, and read/write splitting. After briefly mentioning these, the article focuses on design principles for high‑concurrency services and a 60k QPS flash‑sale case.

High‑Concurrency System Design Principles

High‑concurrency interfaces share one key trait: speed. Faster request processing shortens user feedback time and increases the number of requests a server can handle, making "fast" the primary metric for evaluating performance and capacity improvements.

Two guiding principles help achieve speed:

Do less – limit functional scope and reduce the amount of data processed per request.

Do it cleverly – choose implementation methods, cache types, and cache‑access timing that fit the business characteristics.

Do Less

The fastest program does nothing. An interface that handles fewer functions and reads less data responds faster.

Selective Feature Scope

For high‑concurrency APIs, avoid features that involve hard‑to‑cache, highly personalized data. For example, displaying each user’s flash‑sale eligibility requires a database read per user, leading to low cache hit rates and database pressure.

One solution is to shift the data dimension: instead of caching per user ID, cache a list of successful user IDs keyed by the flash‑sale event ID, dramatically improving cache hit rates.

Another approach is to postpone personalized data retrieval to later stages of the user flow, keeping early‑stage pages cache‑friendly and fast.

Minimize Information Volume

Business objects often have multiple dimensions (category, region, date). Presenting all dimensions at once inflates cache size, key count, network transfer, and serialization time. Estimating the object count per request helps gauge performance; designers should cap the number of objects returned and paginate or batch large result sets.

Do It Cleverly

Choose Implementation Based on Business Traits

Assess real‑time requirements, consistency needs, data dimensions, and volume to decide between real‑time queries and offline pre‑computation with caching. For instance, computing the lowest price for a category offline and caching the result yields far better response times than scanning all products on each request.

Appropriate Cache Selection and Usage

Cache is essential for handling high concurrency. Choose the right cache type and access timing:

Varnish – reverse‑proxy for static pages.

Ehcache – in‑process memory cache for small, rarely changing data.

Memcached – distributed KV cache for larger objects.

Redis – in‑memory KV store with rich data structures, also useful for distributed locks and queues.

Even with caching, misuse (oversized values, too many keys per request, frequent updates) can become a bottleneck.

Flash‑Sale (秒杀) Practice

Flash‑Sale Business Analysis

Typical flash‑sale characteristics:

Massive instantaneous traffic.

Many participants but few items.

Read‑heavy, write‑light.

High real‑time requirement for status transitions.

The flow consists of three phases: pre‑start (load page & query status), during the event (burst of purchase requests), and post‑end (similar to pre‑start).

Page‑Load Requests

Static content (activity info, images, rules) can be cached with Varnish using a hash of activity ID and city ID as the key, while dynamic status is refreshed via AJAX.

Activity‑Status Queries

Status has three states: not started, available, sold out, determined by start time and remaining stock. The start‑time is relatively static and can be cached in Ehcache; stock, which changes rapidly, is cached in Memcached as a boolean (available/not) and updated only when stock is exhausted, using CAS to avoid race conditions.

Purchase Requests

Analysis

Purchase requests are write‑heavy and cannot be cached. Each request typically performs two DB steps: decrement stock and create an order, which can cause DB contention under high load.

Because only a tiny fraction of users can actually purchase, it is essential to limit the number of users reaching the stock‑decrement step. A queue length check (e.g., using Memcached counters or a Redis SET) can enforce this limit.

Further Optimizations

1. Stock Sharding – split total stock into multiple Redis keys to reduce lock contention.

2. Asynchronous Processing – offload stock deduction and order creation to a message queue, notifying users of results via SMS or push.

Flash‑Sale Summary

By routing static page loads through Varnish, caching start time with Ehcache, caching availability with Memcached, and using a Redis queue to throttle purchase attempts, the system reduced the load on the database to the order‑of‑magnitude of the actual purchasable items.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System Designcachinghigh concurrencyBackend Performanceflash salequeue control
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.