Databases 19 min read

Why Redis Cluster’s CLUSTER SLOTS Command Slows Down Large Deployments and How to Fix It

This article analyzes the latency spikes observed in large‑scale Redis clusters after node expansion, identifies the CLUSTER SLOTS command and slot‑migration handling in hiredis‑vip and Jedis as the root cause, proposes a slot‑traversal optimization, and demonstrates a ten‑fold performance improvement with detailed benchmarks.

dbaplus Community
dbaplus Community
dbaplus Community
Why Redis Cluster’s CLUSTER SLOTS Command Slows Down Large Deployments and How to Fix It

Background

Vivo’s internal Redis clusters have grown rapidly, with node counts ranging from fewer than 50 to over 100 and the number of clusters increasing from tens to thousands. After expansion operations, services that use the hiredis‑vip client for MGET observed noticeable latency increases.

Problem Description

During a post‑expansion incident, CPU usage spiked dramatically while bandwidth remained normal. Monitoring identified the function clusterReplyMultiBulkSlots consuming more than 50 % of CPU time, indicating a performance bottleneck for the CLUSTER SLOTS command.

Root Cause Analysis

Slot migration triggers frequent CLUSTER SLOTS calls.

Both hiredis‑vip and Jedis refresh their slot topology cache by executing CLUSTER SLOTS when they receive a MOVED error.

The original implementation of clusterReplyMultiBulkSlots iterates over every master node and every slot, giving a time complexity of number_of_masters × total_slots (total_slots = 16384).

When the number of master nodes grows (e.g., 100 masters), the CPU cost of CLUSTER SLOTS grows linearly, causing high latency.

Optimization Proposal

Replace the double loop with a single pass over server.cluster->slots, grouping consecutive slots that belong to the same master. This reduces the algorithmic complexity to total_slots only.

void clusterReplyMultiBulkSlots(client *c) {
    int num_masters = 0, start = -1;
    void *slot_replylen = addReplyDeferredLen(c);
    clusterNode *n = NULL;
    for (int i = 0; i <= CLUSTER_SLOTS; i++) {
        if (n == NULL) {
            if (i == CLUSTER_SLOTS) break;
            n = server.cluster->slots[i];
            start = i;
            continue;
        }
        if (i == CLUSTER_SLOTS || n != server.cluster->slots[i]) {
            addNodeReplyForClusterSlot(c, n, start, i-1);
            num_masters++;
            if (i == CLUSTER_SLOTS) break;
            n = server.cluster->slots[i];
            start = i;
        }
    }
    setDeferredArrayLen(c, slot_replylen, num_masters);
}

Implementation Details

The new logic outputs a compact representation of continuous slot blocks per master, dramatically reducing the amount of data generated by CLUSTER SLOTS. The change was merged into Redis 6.2.2 (pull request #8541, https://github.com/redis/redis/pull/8541).

Benchmark Results

Test environment : Manjaro 20.2, AMD Ryzen 7 4800H, 16 GB RAM, 100 master nodes (all primary). The benchmark repeatedly issued CLUSTER SLOTS on a single node.

CPU usage : After optimization the CPU share of CLUSTER SLOTS dropped from >50 % to a negligible fraction.

Latency : Original version – cluster slots cost time:2061 µs; Optimized version – cluster slots cost time:168 µs. Latency was reduced to ~8.2 % of the original on a 100‑master cluster.

Conclusion

The investigation showed that CLUSTER SLOTS is a performance hotspot in large Redis clusters, especially during slot migration. By simplifying the slot‑enumeration algorithm, both CPU consumption and latency were significantly reduced, confirming the effectiveness of the optimization.

References

Redis source: https://github.com/redis/redis Jedis client: https://github.com/redis/jedis hiredis‑vip client: https://github.com/vipshop/hiredis-vip perf tool:

https://perf.wiki.kernel.org/index.php/Main_Page
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databaseredisCluster
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.