Analysis of Redis Master‑Slave Replication and Cluster Working Principles
This article explains the mechanisms of Redis master‑slave data synchronization—including full and partial resynchronization—details the internal workings of Redis clustering, slot assignment, automatic node detection, and failover recovery, and provides practical insights for building reliable Redis deployments.
Preface
The author apologizes for losing the original draft due to a cache clean‑up and reminds readers to back up their work. The previous article covered cache eviction, RDB and AOF files; this one focuses on data replication and cluster operation principles.
Table of Contents
Analysis of master‑slave data synchronization principles.
Redis cluster working principle analysis.
Cluster slot assignment mechanism.
Cluster service auto‑detection and failover recovery operations.
Master‑Slave Data Synchronization Principle Analysis
Redis supports two synchronization modes: full resynchronization and partial resynchronization.
Full Resynchronization Process
When a slave starts with no data, it sends slaveof master_ip port to request a complete data backup from the master. The steps are:
The slave sends a PSYNC request to the master.
The master generates an RDB snapshot and returns it to the slave.
During the sync, the master continues to execute write commands and forwards them to the slave.
The slave parses the received RDB file; once parsing and command replay are finished, the full resynchronization is complete.
The full resynchronization essentially backs up the entire dataset from the master to the slave.
Partial Resynchronization
If a slave loses connection after copying part of the data, it can resume synchronization using an offset mechanism. Both master and slave maintain a replication backlog buffer (default 1 MiB, configurable via repl-backlog-size) that stores recent write commands and their offsets. Upon reconnection, the slave sends its last known offset; if the offset exists in the backlog, the master continues sending commands from that point. Advantages include reduced data transfer, faster convergence, and lower bandwidth and CPU consumption.
Reduces the amount of data synchronized compared to full sync.
Allows faster achievement of data consistency.
Saves bandwidth and avoids the CPU cost of generating a new RDB file.
Redis Cluster Working Principle Analysis
In production, a Redis cluster consists of multiple nodes. Nodes are linked together by executing the CLUSTER MEET target_ip target_port command. For example, with three nodes:
Initially each node forms its own pseudo‑cluster.
Node 2 sends CLUSTER MEET node1_ip node1_port and joins node 1’s cluster.
Node 3 also sends CLUSTER MEET node1_ip node1_port; after node 3 joins, node 2 and node 3 perform the same handshake, resulting in a fully connected three‑node cluster.
Connection establishment between two nodes (e.g., node 1 and node 2) proceeds as follows:
Node 1 creates a Node structure containing node 2’s information.
Node 1 sends CLUSTER MEET IP PORT to node 2.
Node 2 creates its own Node structure for node 1.
Node 2 replies with a PONG command.
Node 1 receives the PONG and then sends a PING, completing the handshake.
Redis Cluster Slot Assignment Process
When a write command is sent to the cluster, the slot mechanism determines which master node should execute it. The cluster has 16 384 hash slots; each master owns a subset. For example, node 1 owns slots 0‑4999, node 2 owns 5000‑9999, and node 3 owns 10000‑16383. The command SET key1 value1 is hashed (CRC16(key1) & 16383 = 8876), which falls into node 2’s range, so node 2 executes it. If a command hashes to a slot owned by a failed node, the cluster redirects it to the appropriate master.
Cluster Service Auto‑Detection & Failover Recovery
Each master periodically sends heartbeat messages to the other masters. If a master does not respond within a timeout, it is marked as suspect. When a majority of masters suspect a node, it is marked as down and a broadcast is sent. The downed node’s slaves then elect one among themselves to become the new master.
Example: Node 1 pings nodes 2 and 3; node 2 replies, node 3 does not, so node 1 marks node 3 as suspect. Node 2 also pings node 1 (receives reply) and node 3 (no reply). Since more than half of the masters consider node 3 down, node 2 marks it as down and broadcasts the event. Slaves of node 3 (nodes 5 and 6) stop replicating from node 3; node 5 is promoted to master, and node 6 begins replicating from node 5.
Conclusion
This article first detailed the implementation of Redis master‑slave replication, then explained the composition and operation of Redis clusters, and finally described the mechanisms for automatic node detection and failover recovery.
The next article will cover cluster deployment in depth; stay tuned for more practical guidance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
