Designing a Million‑QPS Multi‑Level Cache Architecture
This article outlines a multi‑level cache system for handling over a million QPS, detailing the architecture from client to database, key components like Caffeine and Redis Cluster, and providing concrete code examples for read‑through and write‑through flows.
In the era of internet and big data, a cache system capable of supporting hundred‑million‑scale data access is crucial for system performance. The goal is to achieve 99.99% cache hit rate, keeping database traffic under 1%.
Cache Architecture Overview
The architecture follows a "multi‑level interception + sharding service + eventual consistency" model, consisting of:
Client
Gateway layer (rate limiting)
Application layer
Local cache (e.g., Caffeine) as L1
Distributed cache (e.g., Redis Cluster) as L2
Database (MySQL)
Request Entry Layer
Load balancers (Nginx/LVS) distribute traffic and can perform simple rate limiting, gray releases, and routing. CDN handles static resources, images, and short videos at the edge.
Business Application Layer
Stateless services cache response results by interface or business dimension (e.g., cache for 1 minute).
Local in‑memory cache (Caffeine, Ehcache) serves as L1 for ultra‑hot data with sub‑millisecond latency.
Cache Layer
Distributed cache clusters (Redis, Memcached, Tair, custom caches) act as L2, sharing hot data across nodes.
Support multi‑AZ, sharding, read‑write separation; client‑side or server‑side sharding distributes load.
Database and Protection Layer
Cache‑aside pattern with optional cache‑bypass and read‑write separation ensures the database is only hit when the cache misses; writes update the cache asynchronously to maintain consistency.
Read Flow (L1 → L2 → DB)
Cache<String, Object> localCache = Caffeine.newBuilder()
.maximumSize(10_000_000) // ten million entries
.expireAfterWrite(5, TimeUnit.MINUTES) // short TTL
.build();
Object get(String key) {
Object val = localCache.getIfPresent(key); // L1
if (val != null) return val;
val = redis.get(key); // L2
if (val != null) {
localCache.put(key, val);
return val;
}
val = db.query(key); // DB
if (val != null) {
redis.set(key, val, randomExpire(300, 600)); // random TTL
localCache.put(key, val);
}
return val;
}Write Flow
Writes go to the database first, then delete or update multi‑level caches (Cache‑Aside) to avoid inconsistency.
Eventual Consistency
Achieved through asynchronous synchronization, random TTLs, and compensation mechanisms.
Redis Cluster Deployment
Typical setup: minimum 3 masters and 3 slaves, 4096 slots with automatic sharding, AOF + RDB hybrid persistence, and optional proxy/cache‑service for service‑oriented access (e.g., Weibo core).
Mike Chen's Internet Architecture
Over ten years of BAT architecture experience, shared generously!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
