Backend Development 5 min read

Designing a Million‑QPS Multi‑Level Cache Architecture

This article outlines a multi‑level cache system for handling over a million QPS, detailing the architecture from client to database, key components like Caffeine and Redis Cluster, and providing concrete code examples for read‑through and write‑through flows.

Mike Chen's Internet Architecture

Mar 23, 2026

Designing a Million‑QPS Multi‑Level Cache Architecture

In the era of internet and big data, a cache system capable of supporting hundred‑million‑scale data access is crucial for system performance. The goal is to achieve 99.99% cache hit rate, keeping database traffic under 1%.

Cache Architecture Overview

The architecture follows a "multi‑level interception + sharding service + eventual consistency" model, consisting of:

Client

Gateway layer (rate limiting)

Application layer

Local cache (e.g., Caffeine) as L1

Distributed cache (e.g., Redis Cluster) as L2

Database (MySQL)

Request Entry Layer

Load balancers (Nginx/LVS) distribute traffic and can perform simple rate limiting, gray releases, and routing. CDN handles static resources, images, and short videos at the edge.

Business Application Layer

Stateless services cache response results by interface or business dimension (e.g., cache for 1 minute).

Local in‑memory cache (Caffeine, Ehcache) serves as L1 for ultra‑hot data with sub‑millisecond latency.

Cache Layer

Distributed cache clusters (Redis, Memcached, Tair, custom caches) act as L2, sharing hot data across nodes.

Support multi‑AZ, sharding, read‑write separation; client‑side or server‑side sharding distributes load.

Database and Protection Layer

Cache‑aside pattern with optional cache‑bypass and read‑write separation ensures the database is only hit when the cache misses; writes update the cache asynchronously to maintain consistency.

Read Flow (L1 → L2 → DB)

Cache<String, Object> localCache = Caffeine.newBuilder()
    .maximumSize(10_000_000) // ten million entries
    .expireAfterWrite(5, TimeUnit.MINUTES) // short TTL
    .build();

Object get(String key) {
    Object val = localCache.getIfPresent(key); // L1
    if (val != null) return val;
    val = redis.get(key); // L2
    if (val != null) {
        localCache.put(key, val);
        return val;
    }
    val = db.query(key); // DB
    if (val != null) {
        redis.set(key, val, randomExpire(300, 600)); // random TTL
        localCache.put(key, val);
    }
    return val;
}

Write Flow

Writes go to the database first, then delete or update multi‑level caches (Cache‑Aside) to avoid inconsistency.

Eventual Consistency

Achieved through asynchronous synchronization, random TTLs, and compensation mechanisms.

Redis Cluster Deployment

Typical setup: minimum 3 masters and 3 slaves, 4096 slots with automatic sharding, AOF + RDB hybrid persistence, and optional proxy/cache‑service for service‑oriented access (e.g., Weibo core).

distributed-systems High Concurrency Caffeine cache architecture Redis Cluster

Written by

Mike Chen's Internet Architecture

Over ten years of BAT architecture experience, shared generously!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.