Mastering Redis Cluster: Deep Dive into Sharding, Failover, and Scaling
This article provides a comprehensive guide to Redis Cluster, covering its sharding mechanism, hash slot mapping, replication and automatic failover, client data location, slot reassignment, MOVED/ASK redirection, communication overhead, and practical tuning tips for large‑scale deployments.
Why Use Redis Cluster
When a single Redis instance stores massive data (e.g., 8 million keys occupying 20 GB), RDB persistence forks a child process, blocking the main thread and causing latency spikes. Redis Cluster shards the keyspace across multiple nodes, eliminating the fork‑induced pause and enabling horizontal scaling.
Cluster Basics
Redis Cluster partitions the keyspace into 16 384 hash slots. Each node owns a subset of slots, and the cluster maintains a decentralized mapping of slots to nodes.
Key‑to‑Slot Mapping
The client computes a CRC16 checksum of the key, takes the result modulo 16 384, and obtains the target slot. A key can contain a tag (e.g., {user123}) to force placement on a specific slot.
Slot Distribution
When the cluster is created (e.g., with cluster create), Redis automatically distributes the 16 384 slots evenly among the nodes. Administrators can manually assign slots using cluster addslots to balance heterogeneous hardware.
Replication and Automatic Failover
Each master can have one or more slave replicas that continuously synchronize via the standard Redis replication protocol. If a master fails, the cluster promotes a slave to master. The optional cluster-require-full-coverage setting allows the cluster to stay operational even when some masters lack replicas.
Gossip Protocol and Node Communication
Nodes exchange state via the Gossip protocol. Every second a node selects five random peers and pings the one that has not responded longest. Ping/Pong messages carry node status, slot mappings, and a 16 384‑bit bitmap (≈2 KB) indicating owned slots. In large clusters (e.g., 1 000 nodes) these heartbeats become the primary bandwidth consumer.
Client Slot Caching
When a client connects to any node, that node returns the full slot‑to‑node mapping, which the client caches locally. For each command the client computes the key’s slot, looks up the responsible node in the cache, and sends the request directly to that node.
Redirection Errors
If a client contacts a node that does not own the target slot, the node replies with a redirection error:
MOVED : The slot has been permanently moved; the client updates its cache and retries the command on the new node.
ASK : The slot is in the middle of migration; the client sends an ASKING command to the target node before retrying, without updating its cache.
GET mykey
(error) MOVED 16330 172.17.18.2:6379 GET mykey
(error) ASK 16330 172.17.18.2:6379Cluster Size Limits
Officially Redis Cluster supports up to 1 000 nodes. The limit is driven by the bandwidth consumed by Gossip heartbeats and slot‑mapping broadcasts. Adjusting cluster-node-timeout (default 15 s) reduces heartbeat frequency, but a larger timeout delays failure detection.
Practical Tuning Tips
Reduce per‑second ping frequency (e.g., one ping per node) when network bandwidth is constrained.
Increase cluster-node-timeout to 20–30 s to lower heartbeat traffic, accepting slower failure detection.
Prefer the fixed 16 384‑slot mapping over a global key‑to‑node table; the slot table is far more memory‑efficient.
Summary
Redis Cluster solves large‑scale storage and latency problems by sharding keys across multiple nodes.
Each key maps to one of 16 384 slots via CRC16; slots are evenly distributed among nodes.
Replication provides high availability; slaves are promoted automatically on master failure.
Gossip disseminates node status and slot mappings, but its heartbeat traffic limits cluster size.
Clients cache slot mappings, handle MOVED/ASK redirections, and can tune cluster-node-timeout to balance responsiveness and bandwidth usage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
