From 1 ns to 10 ms: Why Caching Exists and Why It Keeps You Up at Night
The article explains why caching is indispensable—from nanosecond‑level CPU caches to millisecond‑level disks—covers the classic pitfalls of penetration, breakdown and avalanche, and walks through consistency strategies such as Cache‑Aside, delayed double‑delete, and Canal‑based binlog syncing for high‑concurrency systems.
You often write redisTemplate.opsForValue().set() and think caching is just storing a key‑value pair, until a production alert shows DB CPU at 100 % because many requests bypass the cache and data inconsistency appears.
What cache is and why it’s mandatory
Cache temporarily holds hot data in fast storage to avoid accessing slower layers—essentially “space for time”. Jeff Dean’s latency data shows the gap: L1 cache ~1 ns, L2 3‑5 ns, L3 10‑15 ns, DRAM 80‑100 ns, SSD random read ~16 µs, HDD seek ~10 ms—a five‑order‑of‑magnitude difference.
The speed advantage is guaranteed by the locality principle: time locality (recently accessed data is likely to be accessed again) and spatial locality (nearby data is also likely to be accessed).
Because of this gap, cache has moved from an optional optimization to an architectural necessity.
Cache lives at every layer of the computer system
Hardware: CPU’s L1/L2/L3 caches are transparent to developers but essential. Front‑end: browsers use HTTP strong and negotiated caching to cut repeat requests. Network: CDNs cache static assets at edge nodes. Backend: Redis is the de‑facto distributed cache, achieving >100 k QPS thanks to pure‑memory operations and I/O multiplexing. Local JVM: Caffeine and Guava provide microsecond‑level latency but cannot be shared across instances. In practice, local and distributed caches are combined into a multi‑level hierarchy—local first, then Redis, then the database.
Cache speeds up the system but also adds complexity
Introducing cache brings three classic problems:
Cache penetration – the requested key is absent in both cache and DB, causing every request to hit the DB. Typical fix: Bloom filter (millions of entries fit in a few MB with <1 % false‑positive rate) or caching null values.
Cache breakdown – a hot key expires and many concurrent requests flood the DB. Typical fix: mutex lock (e.g., Redis SETNX for a distributed lock) or logical expiration.
Cache avalanche – many keys expire simultaneously or the cache service crashes, sending a massive load to the DB. Fixes: add random jitter to TTLs, use multi‑level cache as a fallback, and apply circuit‑breaker rate limiting at the entry point.
Beyond these, cache introduces a consistency window between DB and cache.
Data consistency: from textbook answer to production‑grade solutions
The widely used Cache‑Aside (旁路缓存) pattern works as follows:
Read: check cache → if hit, return; else read DB, write result to cache, return.
Write: update DB first, then delete the cache entry.
Deletion rather than update avoids stale data under concurrent writes: if two threads update the same row, the later thread might overwrite the cache with an older value. Deleting forces lazy recomputation on the next read.
In high‑concurrency scenarios, Cache‑Aside still leaves a small inconsistency window. A common mitigation is delayed double delete : delete cache → update DB → wait (usually DB read time + a few hundred ms) → delete cache again. The wait time is hard to tune—too short is ineffective, too long widens the window.
A more robust approach is to use Canal , Alibaba’s open‑source MySQL binlog subscriber. Canal listens to binlog changes, pushes them via a message queue, and consumers delete or update the cache. Business code only writes to the DB; cache sync is decoupled, and the inconsistency window is typically under 100 ms.
Choosing a consistency strategy
Cache Aside – eventual consistency, best for read‑heavy workloads, low risk of data loss.
Read/Write Through – strong consistency, balanced read/write, higher architectural complexity.
Write Behind – eventual consistency, write‑heavy scenarios, risk of data loss if the cache crashes before flushing.
The choice is a trade‑off among consistency, performance, and system complexity.
Consistency challenges under high concurrency
Three typical dilemmas:
Dilemma 1 : Concurrent writes cause out‑of‑order updates. Example: Thread A updates DB, Thread B updates DB, Thread B deletes cache, Thread A deletes cache – if a delete fails, the cache may retain stale data.
Dilemma 2 : “Delete‑then‑update” pattern. Thread A deletes cache, Thread B reads stale DB value, writes it back to cache, then Thread A updates DB, leaving the cache permanently stale until TTL expires.
Dilemma 3 : Cache‑deletion failure leads to long‑term inconsistency; subsequent reads always hit the stale cache.
Production environments typically deploy three defensive layers:
Retry mechanism : on cache‑delete failure, enqueue a retry task in a message queue; simple but adds MQ dependency and a residual window.
Distributed lock serialization : lock the key (e.g., Redis SETNX) so only one thread updates at a time. This reduces write throughput due to lock overhead, but offers the strongest consistency; read‑write locks can mitigate contention when reads dominate.
Canal + MQ final consistency : Canal watches binlog, pushes delete events via MQ; even if a delete fails, MQ retries guarantee eventual consistency, usually within 100 ms.
There is no perfect solution for high‑concurrency consistency; the right choice depends on business requirements. For strict consistency (e.g., financial accounts), use distributed‑lock serialization. For tolerable short‑lived inconsistency (e.g., product inventory), Canal + MQ offers the best cost‑performance. For low write volume, Cache‑Aside with retries suffices.
In summary, caching bridges a five‑order‑of‑magnitude speed gap, relies on locality, and forces engineers to balance speed, consistency, and complexity. Ensure your system protects against penetration, breakdown, and avalanche, picks a consistency pattern matching your read/write ratio, and replaces “delete‑then‑update” with the safer “update‑then‑delete” approach.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ZhiKe AI
We dissect AI-era technologies, tools, and trends with a hardcore perspective. Focused on large models, agents, MCP, function calling, and hands‑on AI development. No fluff, no hype—only actionable insights, source code, and practical ideas. Get a daily dose of intelligence to simplify tech and make efficiency tangible.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
