Databases 20 min read

Performance Analysis and Optimization of Redis Cluster CLUSTER SLOTS Command

In large Redis clusters the original CLUSTER SLOTS implementation traversed every master and all 16,384 slots, causing ~52 % CPU usage and high MGET latency during migrations, but redesigning it to iterate the pre‑built slot array reduced complexity to O(total slots), cutting execution time from 2 ms to 0.17 ms and eliminating the CPU hotspot, a fix now merged into Redis 6.2.2.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Performance Analysis and Optimization of Redis Cluster CLUSTER SLOTS Command

Background : In production environments, Redis clusters with a large number of nodes (100+) are often expanded to meet growing traffic. After expansion, some services report increased latency, especially those that rely on real‑time reads from Redis (e.g., model inference).

Problem description : A specific Redis cluster was expanded, and the business client (hiredis‑vip) observed higher latency for MGET operations. The cluster runs Redis 3.x/4.x, with over 100 master nodes and many clients (Jedis, hiredis‑vip).

Observed symptoms :

Bandwidth is not saturated.

CPU usage spikes dramatically during the issue.

OPS and CPU load are out of phase, suggesting an indirect relationship.

Perf‑top analysis identified the function clusterReplyMultiBulkSlots consuming ~52% CPU.

Root‑cause investigation :

During slot migration, the CLUSTER SLOTS command is invoked frequently by clients handling MOVED errors.

The CLUSTER SLOTS implementation traverses every master node and every slot (16384), leading to O(number_of_masters × number_of_slots) complexity.

Both hiredis‑vip and Jedis update their slot topology by re‑executing CLUSTER SLOTS after a MOVED response, causing a cascade of heavy commands in large clusters.

Code excerpt (original implementation) :

void clusterReplyMultiBulkSlots(client *c) {
    /* Format: 1) start slot 2) end slot 3) master IP/port/ID 4) replica IP/port/ID ... */
    int num_masters = 0;
    void *slot_replylen = addDeferredMultiBulkLength(c);
    dictIterator *di = dictGetSafeIterator(server.cluster->nodes);
    while ((de = dictNext(di)) != NULL) {
        clusterNode *node = dictGetVal(de);
        if (!nodeIsMaster(node) || node->numslots == 0) continue;
        for (int j = 0; j < CLUSTER_SLOTS; j++) {
            int bit, i;
            if ((bit = clusterNodeGetSlotBit(node,j)) != 0) {
                if (start == -1) start = j;
            }
            if (start != -1 && (!bit || j == CLUSTER_SLOTS-1)) {
                // build reply for a continuous slot range
                // ... (omitted for brevity) ...
                start = -1;
                num_masters++;
            }
        }
    }
    dictReleaseIterator(di);
    setDeferredMultiBulkLength(c, slot_replylen, num_masters);
}

Bitmap logic (used to test slot ownership):

int clusterNodeGetSlotBit(clusterNode *n, int slot) {
    return bitmapTestBit(n->slots, slot);
}

int bitmapTestBit(unsigned char *bitmap, int pos) {
    off_t byte = pos/8;
    int bit = pos & 7;
    return (bitmap[byte] & (1<
Client handling of MOVED
:
hiredis‑vip calls
cluster_update_route_by_addr
, which triggers a fresh
CLUSTER SLOTS
request.
Jedis invokes
renewSlotCache
, also issuing
CLUSTER SLOTS
to refresh topology.
Both clients therefore generate additional
CLUSTER SLOTS
traffic during slot migration.
Optimization proposal
:
Iterate directly over
server.cluster->slots
, which already stores a pointer to the owning node for each slot.
Detect continuous ranges of slots belonging to the same node and emit a single reply for each range, reducing the algorithmic complexity to O(total_slots).
Optimized implementation (simplified)
:
void clusterReplyMultiBulkSlots(client *c) {
    int num_masters = 0, start = -1;
    void *slot_replylen = addReplyDeferredLen(c);
    clusterNode *n = NULL;
    for (int i = 0; i <= CLUSTER_SLOTS; i++) {
        if (n == NULL) {
            if (i == CLUSTER_SLOTS) break;
            n = server.cluster->slots[i];
            start = i;
            continue;
        }
        if (i == CLUSTER_SLOTS || n != server.cluster->slots[i]) {
            addNodeReplyForClusterSlot(c, n, start, i-1);
            num_masters++;
            if (i == CLUSTER_SLOTS) break;
            n = server.cluster->slots[i];
            start = i;
        }
    }
    setDeferredArrayLen(c, slot_replylen, num_masters);
}
Results
:
CPU usage of the
CLUSTER SLOTS
command dropped from ~52% to a negligible level.
Execution time decreased from ~2061 µs to ~168 µs (≈8.2% of the original).
Performance gains were validated with perf flame graphs and benchmark logs.
Conclusion
: The original
clusterReplyMultiBulkSlots
implementation has a performance defect in large Redis clusters. By leveraging the existing
server.cluster->slots
array, the command’s complexity is reduced, leading to significant CPU and latency improvements. The fix has been merged into Redis 6.2.2.
JavaPerformanceOptimizationRedisC++clusterperf
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.