Performance Analysis and Optimization of Redis Cluster CLUSTER SLOTS Command
This article investigates the high CPU usage and latency observed after expanding a large Redis cluster, analyzes the root cause in the CLUSTER SLOTS implementation and client MOVED handling, proposes an optimized slot‑traversal algorithm, and demonstrates significant performance improvements through benchmarking and profiling.
The article begins by describing a scenario where a Redis cluster with over 100 nodes experiences increased latency after a scaling operation, prompting an investigation into the underlying cause.
Initial monitoring shows normal bandwidth but abnormal CPU usage, leading to a focus on the clusterReplyMultiBulkSlots function, which consumes up to 51.84% of CPU during CLUSTER SLOTS execution.
Detailed code analysis reveals that the function iterates over every node and every slot (16384 total), resulting in a time complexity proportional to number of master nodes × total slots . The bitmap‑based slot lookup ( clusterNodeGetSlotBit and bitmapTestBit) is also explained.
Client‑side behavior is examined: both hiredis‑vip (C++) and Jedis (Java) cache slot topology; when a MOVED error occurs they issue a CLUSTER SLOTS command to refresh the cache, which amplifies the load on large clusters.
To reduce the overhead, the author proposes a new algorithm that traverses the server.cluster->slots array directly, grouping consecutive slots belonging to the same master node, thereby lowering the complexity to total slots only.
The optimized implementation is provided (wrapped in ... tags) and its impact is measured: CPU usage drops dramatically, and the CLUSTER SLOTS execution time falls from ~2000 µs to ~168 µs, an 8.2% relative cost.
Finally, the article concludes that the original CLUSTER SLOTS command exhibits a performance defect in large Redis clusters, and the submitted optimization has been merged into Redis 6.2.2, mitigating the issue.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
