Backend Development 15 min read

Designing a High-Concurrency Flash Sale System: Architecture, Rate Limiting, Caching, and Monitoring

This article outlines the end‑to‑end design of a high‑availability flash‑sale system, covering traffic spikes, overload protection, inventory consistency, multi‑level caching, token‑bucket rate limiting, distributed queue processing, service monitoring, and stress‑testing strategies to ensure reliable million‑level transaction handling.

Rare Earth Juejin Tech Community

Aug 4, 2023

Designing a High-Concurrency Flash Sale System: Architecture, Rate Limiting, Caching, and Monitoring

The author reflects on the challenges of building a "秒杀" (flash‑sale) system that must handle millions of requests, emphasizing the need to avoid over‑focus on a single project and to recall accumulated knowledge for designing robust high‑traffic services.

Background Analysis

Common characteristics of million‑level traffic include sudden traffic spikes, 404 or request failures at peak, inventory overselling, security threats such as DDoS or crawlers, and cost constraints on machine resources.

Business Flow

A complete transaction flow is illustrated (image omitted) and the system is divided into modules: User Service, Product Service, Order Service, Transaction Service, and Operations Backend.

System Architecture Design

The core design follows a "funnel" model that filters most traffic at the outer layers before reaching the core services.

Access Layer

Static assets (CSS/JS/Images) are served from CDN.

DNS resolves to LVS nodes, then a forward proxy, followed by an Nginx reverse‑proxy cluster.

Simple interactive data is cached locally in memory (e.g., Guava Cache).

Example cache implementation (pseudo‑code):

//伪代码
var picResource = map[string]url{"pn1":"https://xxxx"}
// 伪代码
go func {
    for {
        //随机10-25分钟更新本地内存的数据，因为随机时间可以避免固定时间更新，导致后端服务流量突发增大，这里用到了错峰的思想
        timeAfterTrigger = time.After(time.Second * random（n）)
        curTime, _ := <-timeAfterTrigger
        update picResource
    }
}

Rate limiting is implemented with a token‑bucket approach. The author provides a simplified token bucket pseudo‑code:

// 伪代码
type token struct {
    TokenNum *atomic.Int32 // 桶现在存桶的个数
    MaxSize  *atomic.Int32 // 桶的容量
    Recode *time.Time      // 上一次消费的时间
}
// 消费token
func (t *token) ConsumeToken(num int32) int32 {
    var realNum int32 // 真实能返回的令牌
    // 如果获取的令牌超过可提供的
    // 计算时间差，来判断这次桶里有多少令牌
    capcity := now - t.Recode
    // t 更新桶里的令牌
    if num > t.TokenNum.Load() {
        //只返回能提供的令牌数
    } else { // 如果满足需求
        // 则桶里消费请求的令牌，并减去当前桶的令牌
    }
    return realNum
}

For distributed rate limiting, the token count can be stored in Redis, using the DECR command and time‑difference calculations to replenish tokens.

Business Layer

User & Product Services

User info is cached locally (first‑level cache) for non‑personalized data.

Personalized data (visit history, purchase history) is cached in Redis for up to a month.

During flash‑sale, user updates are written asynchronously; stale cache entries are deleted to avoid cache‑stampede, using empty‑value caching with TTL.

Product listings rely on Elasticsearch; index updates are scheduled during low‑traffic periods.

Order, Payment & Data Access

Orders are written to a queue instead of calling the order service directly, decoupling and smoothing peak load.

Failed queue writes trigger exponential back‑off.

Order service consumes the queue, checks inventory (cached in Redis, DECR to decrement), and initiates payment; if payment fails or times out, inventory is restored.

Redis master‑slave sync issues are mitigated by enforcing that the master writes only after the slave has synchronized.

Service Governance

Monitoring

Monitoring can be built with Prometheus and Grafana. Example installation commands are provided:

// 安装
wget https://github.com/prometheus/prometheus/releases/download/v2.14.0/prometheus-2.14.0.linux-386.tar.gz
tar -xavf prometheus-2.14.0.linux-386.tar.gz
//下载grafana
wget https://dl.grafana.com/oss/release/grafana-6.5.2-1.x86_64.rpm
sudo yum localinstall grafana-6.5.2-1.x86_64.rpm
//启动
systemctl daemon-reload 
systemctl start grafana-server
systemctl status grafana-server
//配置文件：/etc/sysconfig/grafana-server
GRAFANA_USER=grafana
GRAFANA_GROUP=grafana
GRAFANA_HOME=/usr/share/grafana
LOG_DIR=/var/log/grafana
DATA_DIR=/var/lib/grafana
MAX_OPEN_FILES=10000
CONF_DIR=/etc/grafana
CONF_FILE=/etc/grafana/grafana.ini
RESTART_ON_UPGRADE=true
PLUGINS_DIR=/var/lib/grafana/plugins
PROVISIONING_CFG_DIR=/etc/grafana/provisioning
PID_FILE_DIR=/var/run/grafana
// 创建一个监控：
// 时序类型：
// Counter：计数器，记录请求数量、错误数等
// Gauge：瞬时值，如内存、磁盘使用
// Histogram：采样分布，常用于请求时长
// Summary：统计摘要
//Create a new CounterVec
rpcCounter = prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Name: "rpc_counter",
        Help: "RPC counts",
    },
    []string{"api"},
)
prometheus.MustRegister(rpcCounter)
//Add the given value to counter
rpcCounter.WithLabelValues("api_bookcontent").Add(float64(rand.Int31n(50)))
rpcCounter.WithLabelValues("api_chapterlist").Add(float64(rand.Int31n(10)))
//配置文件
- job_name: 'monitor_test'
    static_configs:
      - targets: ['localhost:xxx']
        labels:
          group: 'group_test'
// 重启
./prometheus --config.file=prometheus.yml

Stress Testing

Peak traffic estimation: assume 50% of daily traffic concentrates in a 2‑hour window, with a 5× safety factor. QPS = (Total × 50%) / (60 × 2) × 5. TPS is roughly 20% of QPS for actual purchase actions.

Testing strategy includes single‑link RPC testing to locate bottlenecks, using tools like AB and wrk. The author suggests finding the maximum thread count where QPS plateaus and response time rises, then determining the QPS peak by increasing connections while keeping threads fixed.

Conclusion

The design emphasizes intercepting as much traffic as possible at upstream layers, employing multi‑level caching, queue‑based load smoothing, asynchronous decoupling, and simple overload protection via rate limiting and circuit breaking.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Caching stress testing rate limiting distributed-systems high-concurrency

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.