Backend Development 13 min read

Cache Design Guidelines: Achieve Microsecond Queries and Survive Traffic Spikes

This article outlines practical cache design principles, covering suitable scenarios, health metrics, common pitfalls like avalanche, breakdown and penetration, and concrete implementation rules for both local (Caffeine) and Redis caches to ensure microsecond‑level response and stable high‑traffic performance.

Architect's Journey

Nov 29, 2025

Cache Design Guidelines: Achieve Microsecond Queries and Survive Traffic Spikes

In high‑concurrency systems, cache serves as a performance accelerator, compressing database queries from milliseconds to microseconds and handling traffic spikes. However, misuse can cause cache avalanche, penetration, breakdown, or even OOM.

When to Use Cache

Static data : immutable dictionaries, configuration items – near 100% hit rate; use local + distributed two‑level cache.

Quasi‑static data : rarely changed (e.g., organization hierarchy, daily‑updated role permissions); set long TTL and actively refresh on change.

Intermediate state data : reusable computation results or temporary copies (e.g., intermediate report stats, local config replica); monitor TTL to avoid stale results.

Hot data : short‑term high‑frequency access (e.g., flash‑sale product info); combine local cache for seconds‑level TTL with Redis fallback.

Read‑heavy data : read/write ratio far exceeds writes (e.g., product detail page read:write ≈1000:1); avoid caching when writes are frequent.

Forbidden scenarios : high‑frequency updates (real‑time transaction amounts) and data requiring strong consistency (payment status) should rely on the database.

Cache Health Metrics

Hit rate : hitRate = hits / (hits + misses). Static data aims for 100%; hot data ≥95%.

Read/Write ratio : rwRatio = readRequests / writeRequests. Use cache when ratio ≥10:1.

Expiration time : set TTL for most keys to prevent unbounded growth; permanent data should be refreshed via scheduled jobs.

Space usage : estimate key size and total memory; watch Redis collection objects (map, list) and large values in local cache.

Read/Write latency : cache ops should stay within a few milliseconds; >10 ms indicates possible big‑key blockage, network delay, or lock contention.

Core Problems and Solutions

1. Cache Avalanche

Scenario: many keys expire simultaneously during a promotion, flooding the database.

Scatter expiration times with random offset, e.g., expireTime = 30*60 + Random.nextInt(300).

Use distributed lock (e.g., Redis SETNX) to ensure only one request fetches from DB on miss.

Keep hot items never expiring; refresh them periodically.

2. Cache Breakdown

Scenario: a hot key expires, causing a surge of DB queries.

Set expiration for hot items during off‑peak hours or keep them permanent with active updates.

Apply mutex lock plus short‑lived fallback cache (e.g., 10 s TTL) to absorb the burst.

3. Cache Penetration

Scenario: attackers request non‑existent IDs, bypassing cache and hitting DB.

Validate request parameters and employ IP‑level rate limiting.

Cache empty results with short TTL (e.g., 5 min) to prevent repeated DB hits.

Deploy a Bloom filter pre‑loaded with existing keys to filter out invalid requests.

Implementation Guidelines

Local Cache (Caffeine)

Enforce maximum element count; prohibit storing collection objects or large values.

Wrap all local caches in a unified component; prefer annotation‑based usage.

Suitable for small, infrequently updated data (e.g., dictionaries) or short‑lived hot data (e.g., 5 s TTL).

Redis Cache

Separate hot and cold data; never use Redis as a primary datastore.

Key naming convention: {system}:{module}:{identifier}; length ≤100 characters; manage via constant file.

All keys (except explicitly permanent hot data) must have TTL; require approval for keys without TTL.

Big‑key handling: treat String ≥100KB, List/ZSET/HASH ≥5000 elements, or HASH total size ≥100MB as big keys; split into smaller keys and batch fetch with MGET (≤200 keys per batch).

Disable dangerous commands (KEYS, FLUSHALL, FLUSHDB); use SCAN for iteration.

Batch operations (e.g., MGET, HMSET) with ≤200 keys or ≤2000KB data per batch.

Control source‑to‑cache latency: for complex DB queries, extend TTL to ≥30 min or rely on two‑level cache.

Consistency Rules

On any DB write, delete or update the corresponding cache entry unconditionally.

Local cache TTL 5–10 s ensures eventual consistency even if Redis update messages are lost.

Hot keys: update‑cache strategy with write lock; non‑hot keys: delete‑cache strategy.

In master‑slave DB setups, read‑through must fetch from master for writes to avoid stale data.

Core Design Logic

Scenario first : verify data meets "read‑many, write‑few, hot, weak consistency" before caching.

Monitoring : continuously track hit rate, read/write ratio, and space usage.

Standard enforcement : limit local cache size, control Redis big keys and key naming, enforce "change‑must‑refresh‑cache" consistency.

Layered defense : mitigate avalanche with scattered TTL + lock, breakdown with permanent hot data, penetration with Bloom filter.

Following these guidelines turns cache into a reliable accelerator rather than a hidden source of system instability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Redis caching local cache Consistency cache avalanche cache penetration

Written by

Architect's Journey

E‑commerce, SaaS, AI architect; DDD enthusiast; SKILL enthusiast

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.