Backend Development 20 min read

Mastering Server‑Side Caching: From Local to Distributed Multilevel Strategies

This article explains why caching is essential for reducing CPU and I/O pressure, outlines key cache attributes such as throughput and hit rate, compares popular local cache libraries, describes distributed cache options, and details the design, consistency, monitoring, and hot‑key handling of a transparent multilevel cache architecture.

Yanxuan Tech Team

Apr 7, 2022

Mastering Server‑Side Caching: From Local to Distributed Multilevel Strategies

1. Cache Positioning

Before introducing caching, we must confirm whether a system truly needs it. The two main reasons for adding a cache are to alleviate CPU pressure by storing method results, pre‑computing data, and reusing common data, and to alleviate I/O pressure by replacing slow storage accesses with fast memory accesses and by shifting reads from single‑point components (e.g., databases) to distributed cache middleware.

2. Cache Attributes

Cache is a typical space‑for‑time trade‑off technique. When designing or selecting a cache, we usually consider four dimensions:

Throughput : measured in operations per second (ops/s), reflecting the efficiency of concurrent reads and writes.

Hit Rate : the ratio of successful cache reads to total requests, indicating the value of the cache.

Extended Functions : features such as maximum capacity, expiration time, eviction events, hit‑rate statistics, etc.

Distributed Support : distinction between in‑process (local) caches and distributed caches; the former is fast but not shared, the latter is shared across nodes.

3. Local Cache

We briefly introduce several mainstream local caches—HashMap, Guava, Ehcache, and Caffeine—against the above attributes.

Throughput : HashMap has the highest throughput because it lacks concurrency control, but it cannot work correctly in multithreaded environments. ConcurrentHashMap adds segment locking to ensure correctness. Guava Cache uses synchronous handling with segment locks, while Caffeine adopts an asynchronous log‑submission mechanism that batches operations to reduce lock contention. The following chart shows Caffeine’s benchmarked throughput.

Hit Rate : To maximize the value of limited memory, caches need eviction strategies. Common policies include:

FIFO : evicts the earliest inserted entries.

LRU : evicts the least recently used entries; suitable for short‑term hot objects but may mistakenly evict frequently used items that have been idle briefly.

LFU : evicts the least frequently used entries; solves the LRU idle‑period problem but requires per‑entry counters and cannot reflect temporal changes in hotness.

Two LFU variants have emerged:

TinyLFU : improves LFU by using a sketch to approximate frequencies, reducing counter update overhead.

W‑TinyLFU : combines LRU and LFU. New entries first go into a short‑lived LRU “Window Cache”; if they pass TinyLFU’s filter they are promoted to the main LFU cache, which is internally segmented with LRU‑style eviction. The diagram below illustrates W‑TinyLFU. Extended Functions : beyond basic read/write, caches may provide capacity limits, expiration policies, eviction events, and hit‑rate statistics. Distributed Support : Caffeine is purely in‑process, while Ehcache can operate in both local and distributed modes (the latter via RMI or JGroup broadcasting).

4. Distributed Cache

In micro‑service environments, solutions such as Ehcache and Infinispan have evolved to support both distributed and embedded deployments. Redis has become the de‑facto choice for distributed caching; detailed exploration is omitted here.

5. Multilevel Cache

Local and distributed caches complement each other rather than compete. A transparent multilevel cache (TMC) combines them, typically using an in‑process cache as level 1, a distributed cache as level 2, and the database as level 3. The following diagram shows a typical multilevel structure. Helios implements such a design; its architecture is illustrated below.

5.1 Cache Consistency

Cache replicas inevitably raise consistency issues. For local‑process vs. distributed caches, a common practice is to broadcast invalidation or refresh notifications (e.g., Redis PUB/SUB, MQ, or ZooKeeper/Etcd) when data changes, causing each node’s level‑1 cache to expire or refresh. Between distributed cache and the database, the main challenge is maintaining consistency under concurrent reads/writes. The widely used Cache‑Aside pattern works as follows:

Miss : Application reads from cache; on miss, it fetches from the database and populates the cache.

Hit : Application reads from cache and returns the value.

Update : Application writes to the database first, then invalidates the cache.

Helios adapts this pattern: after a successful DB write, it actively refreshes the distributed cache and triggers a local cache refresh via MQ, favoring an AP model to avoid dirty reads.

5.2 Cache Monitoring

In‑process caches like Ehcache and Caffeine provide cumulative metrics from start‑up, which cannot reflect time‑varying behavior or identify hot keys. Helios uses a sliding‑window approach with Disruptor‑based asynchronous event consumption to emit per‑second statistics, and employs Sketch to filter low‑frequency accesses, reducing monitoring overhead.

5.3 Cache Hotspots

Hotspot detection is a primary driver for multilevel caching. Local hot data can be pre‑warmed in memory, but sudden spikes require automatic detection. Caffeine’s W‑TinyLFU handles bursty accesses, yet its decay cycle may not match business requirements. Industry solutions (e.g., Youzan’s TMC, JD’s HotKey) aggregate per‑instance statistics, compute hotness in a central node using sliding windows, and broadcast hot‑key lists via etcd. Helios currently focuses on local hotspot detection: miss‑driven statistics use Sketch to limit memory, a threshold module compares current and previous periods, and non‑hot keys are evicted during cycle rotation, keeping memory usage low while providing actionable hot‑key data for tuning.

6. Summary and Outlook

Local in‑process caches and distributed caches serve different positioning and usage scenarios, leading to distinct technology selections and evolution paths. Transparent multilevel caching aims to combine their strengths while reducing integration complexity, yet differences in command protocols and consistency challenges remain. Future work will continue to improve usability, consistency guarantees, and cache governance.

distributed cache local cache Consistency multilevel cache Helios

Written by

Yanxuan Tech Team

NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.