From Single Node to Distributed Cluster: Mastering Redis Evolution
This article walks through Redis's journey from a simple single‑instance cache to a robust, highly available, and horizontally scalable distributed system, covering persistence mechanisms, master‑slave replication, Sentinel automatic failover, and sharding clusters for real‑world high‑traffic applications.
Introduction: Why Do We Need to Tinker with Redis?
In modern application development, Redis is prized for its lightning‑fast speed and rich data structures, serving as a cache, message queue, and lightweight database. Initially introduced to solve database performance bottlenecks, a single‑instance Redis quickly reveals weaknesses under traffic spikes: downtime, data loss, and write bottlenecks.
Stage 1: Solo Hero – The Glory and Fragility of Stand‑alone Redis
At the start, the architecture is simple: application servers, a relational database (e.g., MySQL), and a single Redis instance. Hot data is cached in memory, making reads extremely fast and improving user experience.
However, two fatal drawbacks become evident:
Data volatility : Redis stores data in memory, so a server crash or process exit erases all data.
Service unavailability : A single point of failure means that if Redis goes down, cache requests fall back to the database, potentially causing a cascade of overload.
Stage 2: Data Safe‑Box – RDB and AOF Persistence
To prevent data loss, Redis offers two core persistence mechanisms.
1. RDB (Redis Database) Snapshot
What it is : At configured intervals, Redis creates a binary, highly compressed snapshot of the in‑memory dataset and writes it to disk.
Advantages : Small file size and fast recovery, ideal for backup and disaster recovery.
Disadvantages : Not real‑time; data changes between snapshots can be lost if a crash occurs.
2. AOF (Append‑Only File) Log
What it is : Every write command is appended to a log file; on restart, Redis replays the AOF to rebuild the dataset.
Advantages : Higher data integrity; with the appendfsync setting, persistence can be performed every second or even after each command, minimizing loss.
Disadvantages : Larger file size and slower recovery compared to RDB.
Redis also provides optimization techniques:
Optimization Combo: AOF Rewrite and Hybrid Persistence
AOF files grow continuously. The “AOF Rewrite” creates a compact AOF containing only the minimal commands needed for the current dataset, without stopping service.
Since Redis 4.0, “Hybrid Persistence” writes an RDB‑style snapshot at the beginning of the AOF file and then appends incremental commands, combining fast RDB recovery with AOF’s data safety.
Stage 3: High‑Availability Foundation – Master‑Slave Replication
Persistence solves data backup but not service interruption. Master‑Slave replication introduces a primary node (Master) and one or more secondary nodes (Slaves).
What it is : The Master handles all write requests and synchronously replicates changes to Slaves.
How it works : Slaves typically serve read requests, achieving read‑write separation and reducing load on the Master.
Benefits :
High availability – if the Master fails, a Slave can be promoted to become the new Master.
Read scalability – adding Slaves linearly increases read throughput.
Stage 4: Auto‑Pilot – Sentinel Mode
While Master‑Slave replication provides high availability, manual intervention is required for failover. Sentinel automates this process.
What it is : A separate cluster of processes that monitors the health of Master‑Slave setups and performs automatic failover when needed.
How it works :
Each Sentinel periodically sends PING commands to all nodes to check their status.
If a Master does not respond within a timeout, a Sentinel marks it as subjectively down.
When a quorum of Sentinels agree the Master is objectively down, they elect a leader (using a Raft‑like consensus) to coordinate failover.
The leader promotes the best‑qualified Slave to Master, updates other Slaves, and notifies clients.
Stage 5: Horizontal Scaling – Sharding Cluster
Master‑Slave and Sentinel solve high availability, but all writes still hit a single Master, eventually hitting CPU and memory limits. Sharding distributes data across multiple Masters.
What it is : The dataset is split into many parts, each stored on a different Master node, each possibly with its own Slaves.
Core principle : Redis Cluster introduces 16,384 hash slots. A key’s slot is computed as CRC16(key) % 16384; the node responsible for that slot stores the key.
Two main sharding approaches exist:
Server‑side sharding (Redis Cluster) : Nodes exchange state via a gossip protocol. If a client contacts the wrong node, the node returns a MOVED redirect, and smart clients (e.g., JedisCluster) handle it automatically.
Proxy‑based sharding : A proxy layer (e.g., Twemproxy, Codis) sits between the application and Redis nodes, handling routing and shard management, presenting a single logical Redis instance to the client.
Conclusion: The Evolution Path Is Endless
Reviewing Redis’s architectural evolution shows a continuous cycle of problem discovery and solution: starting from single‑node performance limits, adding persistence for data safety, introducing master‑slave replication for read/write separation and high availability, deploying Sentinel for automated failover, and finally adopting sharding clusters to break write bottlenecks and achieve true horizontal scalability.
This journey not only reflects Redis’s own growth but also offers valuable lessons for designing any distributed system that must balance performance, availability, and scalability.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
