How Redis Cluster Guarantees Write Safety: Architecture, Failover, and Resharding Explained
This article explains Redis Cluster’s architecture, node and slot management, failover and resharding mechanisms, and how they together affect write safety, detailing commands, metadata storage, conflict resolution, and client redirection strategies in distributed environments.
Interface and Architecture
Redis Cluster keeps the key‑value interface compatible. It consists of multiple server nodes and clients that can connect to any node.
Nodes
Each Redis process is a node with a unique node‑id. Nodes store their id, IP and port.
Node Table
The node table records the cluster membership (id, ip, port) and is replicated to all nodes via the gossip protocol.
Hash Slot Map
Data is sharded into 16384 hash slots. The slot for a key is computed by CRC16(key) mod 16384. Each slot is assigned to a node; the mapping is stored in the hash‑slot map, also replicated by gossip.
Cluster Operations
Adding/Removing Nodes
CLUSTER MEETadds a node; CLUSTER FORGET removes one. The command is sent to any node, which updates its local node table and propagates the change.
Slot Management
Commands CLUSTER ADDSLOTS, CLUSTER DELSLOTS, CLUSTER SETSLOT modify the hash‑slot map on a node and are propagated cluster‑wide.
Master‑Slave Replication
Each slot has a master and one or more slaves. Writes go to the master and are asynchronously replicated to slaves. The SLAVEOF command changes replication relationships.
Configuration Storage
Cluster metadata (node table, hash‑slot map, configEpoch) is kept in memory variables myself and cluster and persisted to nodes.conf on each node.
Failover and Epochs
When a master fails, slaves run a Raft‑like election using currentEpoch and lastVoteEpoch. The elected slave becomes the new master and updates the hash‑slot map.
Resharding
Resharding moves slots between nodes using CLUSTER SETSLOT … MIGRATING and … IMPORTING, followed by key migration with CLUSTER GETKEYSINSLOT and MIGRATE. After keys are moved, the slot mapping is updated.
Conflict Resolution
If concurrent resharding and failover produce conflicting slot mappings with the same configEpoch, the node with the smaller id increments its epoch to break the tie.
Write Safety
Conflicts can cause divergent hash‑slot maps, leading to lost writes. Redis mitigates this by delaying slave election (500 ms + random + rank‑based delay) and by ensuring only one master is elected per failover.
Client Interaction
Clients compute the hash slot for a key, locate the responsible node via the slot map, and issue read/write commands. If a node returns a MOVED error, the client follows the redirection; if it returns ASK, the client issues an ASKING command before retrying.
References
https://github.com/redis/redis/blob/unstable/src/cluster.c
https://github.com/redis/redis/blob/unstable/src/cluster.h
https://github.com/redis/redis/blob/unstable/src/server.c
https://github.com/redis/redis/blob/unstable/src/server.h
https://redis.io/topics/cluster-spec
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
