How to Build and Understand a Redis Cluster: Setup, Mechanics, and Failover
This guide walks through installing a Redis cluster with three masters and three slaves using local ports, explains slot allocation, key hashing, gossip communication, failover, node addition, resharding, and best practices for high availability, while providing practical commands and configuration examples.
Cluster Environment Setup
Redis Cluster requires at least three master nodes. In this example we create three masters and three slaves using local ports (7000‑7005). This method is for experimentation only and should not be used in production.
Define the ports for the nodes:
7000-7005and copy
redis.confto a separate file for each port.
Configuration files:
IP: 127.0.0.1 Port: 7000‑7005 Config: 7000/redis-7000.conf, 7001/redis-7001.conf, …, 7005/redis-7005.conf
Edit each
redis.confto enable clustering and set the required options (e.g.,
requirepass,
masterauthif a password is needed).
<code>daemonize yes
# port must match the configuration above
port 7000
cluster-enabled yes
cluster-config-file nodes-7000.conf
cluster-node-timeout 5000
appendonly yes
</code>Start all nodes:
<code># start all services 7000‑7005
cd 7000
redis-server ./redis-7000.conf
</code>Initialize the cluster:
<code>redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 \
127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
--cluster-replicas 1
</code>Query cluster status:
<code>redis-cli -c -h 127.0.0.1 -p 7000
cluster info
</code>Other creation methods are documented in the Redis manual (
utils/create-cluster).
Cluster Principles
Slot Assignment Mechanism
Redis Cluster divides the key space into 16,384 slots. Each node is responsible for a subset of slots. Clients receive the slot map from the cluster and cache it locally, allowing direct routing of commands to the correct node.
Slot Location Algorithm
The key is hashed with CRC16, and the result is masked with
0x3FFFto obtain the slot number. The implementation resides in
src/cluster.c(function
keyHashSlot).
<code>crc16(key,keylen) & 0x3FFF
</code>To find the slot of a key:
<code># query the slot of a key
127.0.0.1:7000> cluster keyslot mykey
(integer) 12318
# list all slot ranges
127.0.0.1:7000> cluster slots
…
</code>Redis automatically redirects the client when a key is accessed on the wrong node (ASK/MOVED redirection).
Redirection (ASK)
If a node receives a command for a key whose slot it does not own, it replies with a special redirection containing the target node address. The client follows the redirect and updates its slot cache.
In plain terms: if the key belongs to another node, the request is forwarded to that node.
<code>set abc sdl
set sbc sdl
</code>Cluster Communication Mechanism
Nodes communicate via a gossip protocol, exchanging messages such as PING, PONG, MEET, and FAIL. Gossip can be centralized (e.g., using ZooKeeper) or fully distributed.
Centralized
Metadata updates are immediate but can become a bottleneck.
Gossip
Nodes periodically send PING messages containing their state and metadata. MEET adds a new node to the cluster. FAIL notifies others that a node is down.
The gossip approach distributes load but introduces a small delay in metadata propagation.
Gossip Port
Each node uses
port + 10000for gossip communication (e.g., node 7001 uses 17001).
Cluster Election Principle
When a master fails, its slaves attempt a failover. The process involves broadcasting
FAILOVER_AUTH_REQUEST, collecting acknowledgments from a majority of masters, and promoting a slave to master.
Slave detects master FAIL.
Slave increments its
currentEpochand broadcasts
FAILOVER_AUTH_REQUEST.
Masters that have not voted yet respond with
FAILOVER_AUTH_ACK.
Slave collects ACKs; if it receives a majority, it becomes the new master.
New master broadcasts a PONG to inform the cluster.
The election requires at least three masters; with only two masters a majority cannot be reached.
Split‑Brain and Data Loss
If a network partition causes multiple masters to accept writes, data loss can occur when the partition heals. Setting
min-replicas-to-write 1mitigates the risk but may affect availability.
<code>// minimum number of replicas that must acknowledge a write
min-replicas-to-write 1
</code>Full Coverage
When
cluster-require-full-coverageis set to
no, the cluster remains available even if a master responsible for a slot goes down without a replica.
Batch Operations
Commands like
MSETand
MGETonly work if all keys map to the same slot. Prefix keys with a hash tag (e.g.,
{user1}) to force them into the same slot.
Example: mset {user1}:1:name zhangsan {user1}:1:age 18
Sentinel vs. Cluster Leader Election
Sentinel elects a leader when a master is marked down, using a similar majority‑vote mechanism based on Raft‑style epochs.
Cluster Fault Tolerance
Failure Detection
Nodes periodically send PING messages. If a node does not reply within the timeout, it is marked
PFAIL. When a majority of masters report a node as
FAIL, the node is considered down.
Failover Process
A slave of the failed master is selected.
The selected slave runs
SLAVEOF NO ONEto become a master.
The new master takes over the slots of the failed node.
The new master broadcasts a PONG to inform the cluster.
Clients start sending commands to the new master.
Adding Nodes and Resharding
To expand the cluster, start new nodes and add them with
redis-cli --cluster add-node. Then use
redis-cli --cluster reshardto move slots.
<code># start new nodes
redis-server redis-7006.conf
redis-server redis-7007.conf
# add node 7006 as a master
redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7001
# reshard slots to the new master
redis-cli --cluster reshard 127.0.0.1:7001
</code>After adding a slave, set its master with
CLUSTER REPLICATE:
<code># on the slave (7007)
cluster replicate 2109c2832177e8514174c6ef8fefd681076e28df
</code>Removing Nodes
Before removing a master, migrate its slots to other masters using
redis-cli --cluster del-nodeafter a reshard.
<code># delete node 7007 (example)
redis-cli --cluster del-node 127.0.0.1:7007 8d935918d877a63283e1f3a1b220cdc8cb73c414
</code>References
《Redis 设计与实现》黄健宏
Why Redis uses 16384 slots
https://blog.csdn.net/wanderstarrysky/article/details/118157751
https://segmentfault.com/a/1190000038373546
Images sourced from the internet; please notify of any infringement.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.