Why Redis Cluster’s CLUSTER SLOTS Command Slows Down Large Deployments and How to Fix It
This article analyzes the latency spikes observed in large‑scale Redis clusters after node expansion, identifies the CLUSTER SLOTS command and slot‑migration handling in hiredis‑vip and Jedis as the root cause, proposes a slot‑traversal optimization, and demonstrates a ten‑fold performance improvement with detailed benchmarks.
Background
Vivo’s internal Redis clusters have grown rapidly, with node counts ranging from fewer than 50 to over 100 and the number of clusters increasing from tens to thousands. After expansion operations, services that use the hiredis‑vip client for MGET observed noticeable latency increases.
Problem Description
During a post‑expansion incident, CPU usage spiked dramatically while bandwidth remained normal. Monitoring identified the function clusterReplyMultiBulkSlots consuming more than 50 % of CPU time, indicating a performance bottleneck for the CLUSTER SLOTS command.
Root Cause Analysis
Slot migration triggers frequent CLUSTER SLOTS calls.
Both hiredis‑vip and Jedis refresh their slot topology cache by executing CLUSTER SLOTS when they receive a MOVED error.
The original implementation of clusterReplyMultiBulkSlots iterates over every master node and every slot, giving a time complexity of number_of_masters × total_slots (total_slots = 16384).
When the number of master nodes grows (e.g., 100 masters), the CPU cost of CLUSTER SLOTS grows linearly, causing high latency.
Optimization Proposal
Replace the double loop with a single pass over server.cluster->slots, grouping consecutive slots that belong to the same master. This reduces the algorithmic complexity to total_slots only.
void clusterReplyMultiBulkSlots(client *c) {
int num_masters = 0, start = -1;
void *slot_replylen = addReplyDeferredLen(c);
clusterNode *n = NULL;
for (int i = 0; i <= CLUSTER_SLOTS; i++) {
if (n == NULL) {
if (i == CLUSTER_SLOTS) break;
n = server.cluster->slots[i];
start = i;
continue;
}
if (i == CLUSTER_SLOTS || n != server.cluster->slots[i]) {
addNodeReplyForClusterSlot(c, n, start, i-1);
num_masters++;
if (i == CLUSTER_SLOTS) break;
n = server.cluster->slots[i];
start = i;
}
}
setDeferredArrayLen(c, slot_replylen, num_masters);
}Implementation Details
The new logic outputs a compact representation of continuous slot blocks per master, dramatically reducing the amount of data generated by CLUSTER SLOTS. The change was merged into Redis 6.2.2 (pull request #8541, https://github.com/redis/redis/pull/8541).
Benchmark Results
Test environment : Manjaro 20.2, AMD Ryzen 7 4800H, 16 GB RAM, 100 master nodes (all primary). The benchmark repeatedly issued CLUSTER SLOTS on a single node.
CPU usage : After optimization the CPU share of CLUSTER SLOTS dropped from >50 % to a negligible fraction.
Latency : Original version – cluster slots cost time:2061 µs; Optimized version – cluster slots cost time:168 µs. Latency was reduced to ~8.2 % of the original on a 100‑master cluster.
Conclusion
The investigation showed that CLUSTER SLOTS is a performance hotspot in large Redis clusters, especially during slot migration. By simplifying the slot‑enumeration algorithm, both CPU consumption and latency were significantly reduced, confirming the effectiveness of the optimization.
References
Redis source: https://github.com/redis/redis Jedis client: https://github.com/redis/jedis hiredis‑vip client: https://github.com/vipshop/hiredis-vip perf tool:
https://perf.wiki.kernel.org/index.php/Main_PageSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
