Backend Development 17 min read

Master Multi-Level Caching: Strategies, Load Balancing, and Fast Recovery

This article explores multi‑level caching architectures, detailing how Nginx, local and distributed caches, and Tomcat interact, and offers practical solutions for expiration strategies, dimensional caching, load‑balancing algorithms, hot‑data handling, atomic updates, and rapid recovery from cache failures.

Efficient Ops
Efficient Ops
Efficient Ops
Master Multi-Level Caching: Strategies, Load Balancing, and Fast Recovery

Cache technology is a classic yet powerful tool for performance optimization, often discussed in interviews covering algorithms, hot data, atomicity, and crash recovery. This article focuses on service‑side caching without involving CDN or data‑structure optimizations, examining architectural and hit‑rate improvements.

1. Introduction to Multi-Level Caching

Multi-level caching stores data at different layers of the system to improve access efficiency. The overall architecture is illustrated below:

The workflow proceeds as follows:

Requests first enter Nginx, which balances load to application Nginx using round‑robin or consistent hashing.

Application Nginx checks its local cache (Lua Shared Dict, Nginx Proxy Cache, or local Redis). A hit returns data immediately, reducing backend pressure.

If the local cache misses, the system queries a distributed cache such as Redis (often with master‑slave replication). A hit is returned and written back to the local cache.

If the distributed cache also misses, the request is forwarded to a Tomcat cluster, again using round‑robin or consistent hashing for load distribution.

Within Tomcat, a local heap cache is consulted first; a hit updates the main Redis cluster.

Optionally, if step 4 misses, a read from the primary Redis cluster can be attempted to avoid overload when the replica is problematic.

When all caches miss, the data is fetched from the database or related services.

The response data is asynchronously written back to the primary Redis cluster, where concurrent writes may cause conflicts that are addressed in later sections.

Thus the system comprises three cache layers: application Nginx local cache, distributed cache, and Tomcat heap cache, each solving specific problems such as hot‑data handling, reducing origin traffic, and mitigating cache‑failure impact.

2. How to Cache Data

2.1 Expiration vs. Non‑Expiration

Two main approaches exist: non‑expiring caches and caches with TTL. Choosing the right mode depends on business needs and data volume.

Non‑expiring cache flow:

Issues include DB‑cache inconsistency on transaction failure, potential dirty data from concurrent writes, and performance impact of synchronous writes. Solutions involve periodic full synchronization or using a messaging mechanism:

Write cache updates as messages.

Cache systems subscribe to messages and update accordingly.

Use IDs with DB lookups or timestamps/MD5 for consistency.

Persist update logs locally or use binlog (e.g., Canal) for replay.

Non‑expiring caches suit low‑volume, high‑frequency data such as users, categories, products, prices, and orders, employing LRU eviction when full.

2.2 Dimensional and Incremental Caching

For e‑commerce items with multiple attributes (basic info, images, status, specs, description), updating all parts is costly. Dimensionalizing data and applying incremental updates reduces bandwidth and request load, especially for frequent status changes like product availability.

3. Distributed Cache and Application Load Balancing

3.1 Distributed Cache

Typically implemented via sharding, using modulo or consistent hashing. For non‑expiring caches, modulo sharding works; for tolerable data loss, consistent hashing limits impact of node failures. Clients can handle sharding or use middleware like Twemproxy; Redis clusters are a common choice.

3.2 Application Load Balancing

Two main algorithms:

Round‑robin : evenly distributes requests, improving overall load balance but reducing cache hit rate as servers increase.

Consistent hashing : routes identical requests to the same node, preserving hit rate but risking load imbalance on hot keys.

Dynamic selection based on traffic:

Use consistent hashing under low load.

Switch to round‑robin for hotspot bursts.

Push hot data to the ingress Nginx for direct response.

4. Hot Data and Cache Updates

Hot data can overload servers. Solutions include:

4.1 Full Cache on a Single Machine + Master‑Slave

All caches reside on the local machine; updates propagate to the primary Redis cluster and replicate to slaves. Updates can be lazy or message‑driven.

4.2 Distributed Cache + Application‑Local Hot Cache

Application Nginx first checks its local cache; on miss it queries Redis and then the origin, finally caching the result locally. Load‑balancing switches from consistent hashing to round‑robin when request volume exceeds a threshold, and pre‑pushes known flash‑sale data.

A real‑time hot‑spot detection system can further improve responsiveness:

Ingress Nginx forwards requests to application Nginx.

Application Nginx reads local cache, then distributed cache or origin.

Requests are reported to a hot‑spot detection system via UDP, Kafka, or Flume.

When thresholds are met, hot data is pushed to the local cache.

Cache consistency is handled by subscribing to change messages or, when unavailable, using reasonable TTLs. For flash‑sale scenarios, activity‑start messages pre‑populate caches and downgrade load‑balancing to round‑robin.

5. Cache Update and Atomicity

Concurrent updates can cause dirty data. Mitigation strategies include:

Timestamp or version checks; Redis’s single‑threaded nature enables atomic updates.

Using Canal to subscribe to DB binlog.

Routing updates through ordered queues for single‑threaded processing.

Distributed locks before modifying cache.

6. Cache Crashes and Fast Recovery

6.1 Modulo Sharding

If a node fails, many keys become unreachable, causing a sudden surge to the backend. Master‑slave redundancy mitigates this, but adding nodes causes massive cache misses; a new cluster is typically built and traffic migrated gradually.

6.2 Consistent Hashing

Node failure only affects a portion of the hash ring, limiting cache miss spikes.

6.3 Rapid Recovery Strategies

Maintain master‑slave redundancy; replace failed nodes promptly.

If availability drops, degrade a subset of users and gradually restore service while background workers pre‑warm caches.

When the entire cache cluster is lost without backup, rebuild caches gradually, using degradation to keep part of the traffic functional and employing workers for pre‑warming.

distributed systemsload balancingRedisCachingCache ConsistencyNginxmulti-level cache
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.