Designing a High-Concurrency Flash Sale System: Architecture, Rate Limiting, Caching, and Monitoring
This article outlines the end‑to‑end design of a high‑availability flash‑sale system, covering traffic spikes, overload protection, inventory consistency, multi‑level caching, token‑bucket rate limiting, distributed queue processing, service monitoring, and stress‑testing strategies to ensure reliable million‑level transaction handling.
The author reflects on the challenges of building a "秒杀" (flash‑sale) system that must handle millions of requests, emphasizing the need to avoid over‑focus on a single project and to recall accumulated knowledge for designing robust high‑traffic services.
Background Analysis
Common characteristics of million‑level traffic include sudden traffic spikes, 404 or request failures at peak, inventory overselling, security threats such as DDoS or crawlers, and cost constraints on machine resources.
Business Flow
A complete transaction flow is illustrated (image omitted) and the system is divided into modules: User Service, Product Service, Order Service, Transaction Service, and Operations Backend.
System Architecture Design
The core design follows a "funnel" model that filters most traffic at the outer layers before reaching the core services.
Access Layer
Static assets (CSS/JS/Images) are served from CDN.
DNS resolves to LVS nodes, then a forward proxy, followed by an Nginx reverse‑proxy cluster.
Simple interactive data is cached locally in memory (e.g., Guava Cache).
Example cache implementation (pseudo‑code):
//伪代码
var picResource = map[string]url{"pn1":"https://xxxx"}
// 伪代码
go func {
for {
//随机10-25分钟更新本地内存的数据,因为随机时间可以避免固定时间更新,导致后端服务流量突发增大,这里用到了错峰的思想
timeAfterTrigger = time.After(time.Second * random(n))
curTime, _ := <-timeAfterTrigger
update picResource
}
}Rate limiting is implemented with a token‑bucket approach. The author provides a simplified token bucket pseudo‑code:
// 伪代码
type token struct {
TokenNum *atomic.Int32 // 桶现在存桶的个数
MaxSize *atomic.Int32 // 桶的容量
Recode *time.Time // 上一次消费的时间
}
// 消费token
func (t *token) ConsumeToken(num int32) int32 {
var realNum int32 // 真实能返回的令牌
// 如果获取的令牌超过可提供的
// 计算时间差,来判断这次桶里有多少令牌
capcity := now - t.Recode
// t 更新桶里的令牌
if num > t.TokenNum.Load() {
//只返回能提供的令牌数
} else { // 如果满足需求
// 则桶里消费请求的令牌,并减去当前桶的令牌
}
return realNum
}For distributed rate limiting, the token count can be stored in Redis, using the DECR command and time‑difference calculations to replenish tokens.
Business Layer
User & Product Services
User info is cached locally (first‑level cache) for non‑personalized data.
Personalized data (visit history, purchase history) is cached in Redis for up to a month.
During flash‑sale, user updates are written asynchronously; stale cache entries are deleted to avoid cache‑stampede, using empty‑value caching with TTL.
Product listings rely on Elasticsearch; index updates are scheduled during low‑traffic periods.
Order, Payment & Data Access
Orders are written to a queue instead of calling the order service directly, decoupling and smoothing peak load.
Failed queue writes trigger exponential back‑off.
Order service consumes the queue, checks inventory (cached in Redis, DECR to decrement), and initiates payment; if payment fails or times out, inventory is restored.
Redis master‑slave sync issues are mitigated by enforcing that the master writes only after the slave has synchronized.
Service Governance
Monitoring
Monitoring can be built with Prometheus and Grafana. Example installation commands are provided:
// 安装
wget https://github.com/prometheus/prometheus/releases/download/v2.14.0/prometheus-2.14.0.linux-386.tar.gz
tar -xavf prometheus-2.14.0.linux-386.tar.gz
//下载grafana
wget https://dl.grafana.com/oss/release/grafana-6.5.2-1.x86_64.rpm
sudo yum localinstall grafana-6.5.2-1.x86_64.rpm
//启动
systemctl daemon-reload
systemctl start grafana-server
systemctl status grafana-server
//配置文件:/etc/sysconfig/grafana-server
GRAFANA_USER=grafana
GRAFANA_GROUP=grafana
GRAFANA_HOME=/usr/share/grafana
LOG_DIR=/var/log/grafana
DATA_DIR=/var/lib/grafana
MAX_OPEN_FILES=10000
CONF_DIR=/etc/grafana
CONF_FILE=/etc/grafana/grafana.ini
RESTART_ON_UPGRADE=true
PLUGINS_DIR=/var/lib/grafana/plugins
PROVISIONING_CFG_DIR=/etc/grafana/provisioning
PID_FILE_DIR=/var/run/grafana
// 创建一个监控:
// 时序类型:
// Counter:计数器,记录请求数量、错误数等
// Gauge:瞬时值,如内存、磁盘使用
// Histogram:采样分布,常用于请求时长
// Summary:统计摘要
//Create a new CounterVec
rpcCounter = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "rpc_counter",
Help: "RPC counts",
},
[]string{"api"},
)
prometheus.MustRegister(rpcCounter)
//Add the given value to counter
rpcCounter.WithLabelValues("api_bookcontent").Add(float64(rand.Int31n(50)))
rpcCounter.WithLabelValues("api_chapterlist").Add(float64(rand.Int31n(10)))
//配置文件
- job_name: 'monitor_test'
static_configs:
- targets: ['localhost:xxx']
labels:
group: 'group_test'
// 重启
./prometheus --config.file=prometheus.ymlStress Testing
Peak traffic estimation: assume 50% of daily traffic concentrates in a 2‑hour window, with a 5× safety factor. QPS = (Total × 50%) / (60 × 2) × 5. TPS is roughly 20% of QPS for actual purchase actions.
Testing strategy includes single‑link RPC testing to locate bottlenecks, using tools like AB and wrk. The author suggests finding the maximum thread count where QPS plateaus and response time rises, then determining the QPS peak by increasing connections while keeping threads fixed.
Conclusion
The design emphasizes intercepting as much traffic as possible at upstream layers, employing multi‑level caching, queue‑based load smoothing, asynchronous decoupling, and simple overload protection via rate limiting and circuit breaking.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.