Databases 18 min read

Performance Analysis and Optimization of Redis Cluster CLUSTER SLOTS Command

This article investigates the high CPU usage and latency observed after expanding a large Redis cluster, analyzes the root cause in the CLUSTER SLOTS implementation and client MOVED handling, proposes an optimized slot‑traversal algorithm, and demonstrates significant performance improvements through benchmarking and profiling.

Architect
Architect
Architect
Performance Analysis and Optimization of Redis Cluster CLUSTER SLOTS Command

The article begins by describing a scenario where a Redis cluster with over 100 nodes experiences increased latency after a scaling operation, prompting an investigation into the underlying cause.

Initial monitoring shows normal bandwidth but abnormal CPU usage, leading to a focus on the clusterReplyMultiBulkSlots function, which consumes up to 51.84% of CPU during CLUSTER SLOTS execution.

Detailed code analysis reveals that the function iterates over every node and every slot (16384 total), resulting in a time complexity proportional to number of master nodes × total slots . The bitmap‑based slot lookup ( clusterNodeGetSlotBit and bitmapTestBit) is also explained.

Client‑side behavior is examined: both hiredis‑vip (C++) and Jedis (Java) cache slot topology; when a MOVED error occurs they issue a CLUSTER SLOTS command to refresh the cache, which amplifies the load on large clusters.

To reduce the overhead, the author proposes a new algorithm that traverses the server.cluster->slots array directly, grouping consecutive slots belonging to the same master node, thereby lowering the complexity to total slots only.

The optimized implementation is provided (wrapped in ... tags) and its impact is measured: CPU usage drops dramatically, and the CLUSTER SLOTS execution time falls from ~2000 µs to ~168 µs, an 8.2% relative cost.

Finally, the article concludes that the original CLUSTER SLOTS command exhibits a performance defect in large Redis clusters, and the submitted optimization has been merged into Redis 6.2.2, mitigating the issue.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceoptimizationdatabaseredisc++Cluster
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.