Design Considerations for Master/Slave Distributed Cache with Proxy and CAS

The article analyzes the use of a master/slave architecture for distributed caching, explains why two clusters, CAS, and proxy are employed, discusses consistency and availability challenges, and evaluates possible mitigation strategies for cache failures.

Architect
Architect
Architect
Design Considerations for Master/Slave Distributed Cache with Proxy and CAS

Background

In distributed caching, a master/slave design is often adopted to improve availability and performance.

Why use two clusters (master/slave) for cache?

Storing two replicas reduces cache penetration pressure on the database and balances traffic during high load, enhancing both reliability and throughput.

Why adopt a master/slave structure?

Designating one cluster as the master ensures a single source of truth, preventing inconsistencies when multiple clients modify the same data concurrently.

Why use CAS (Compare‑and‑Swap)?

CAS protects against concurrent updates that could otherwise overwrite each other; in memcached it is called Check And Set, allowing a write only if the value has not changed since it was read.

Difference of CAS in master/slave scenario

After a successful CAS on the master, the slave is updated directly without performing a CAS on the slave; consistency is maintained by trusting the master as the authoritative source.

Why use a proxy?

A proxy improves availability, simplifies scaling (adding or removing cache servers without restarting clients), and boosts hit rates by handling routing logic centrally.

Problem case

A recent incident showed that when the master failed, the CAS on the master failed, the slave was not updated, and data became inconsistent even though the slave remained reachable.

Why not automatically switch to the slave?

Automatic role switching can cause data chaos because multiple callers need a coordinated configuration; manual switching, while imperfect, is acceptable in prolonged master outages.

Proposed mitigation solutions

1. Let the proxy delete the slave data when master CAS fails. 2. Let the client set the slave data with a short expiration (e.g., 5 minutes) when master CAS fails.

Drawbacks of the solutions

Solution 1 harms hit rate due to rapid invalidation and introduces hidden delete logic in the proxy. Solution 2 still forwards load to the database after the short expiration, only partially relieving pressure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Proxyhigh availabilityMaster‑Slavedistributed cacheCASConsistency
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.