Databases 20 min read

Performance Analysis and Optimization of Redis Cluster CLUSTER SLOTS Command

In large Redis clusters the original CLUSTER SLOTS implementation traversed every master and all 16,384 slots, causing ~52 % CPU usage and high MGET latency during migrations, but redesigning it to iterate the pre‑built slot array reduced complexity to O(total slots), cutting execution time from 2 ms to 0.17 ms and eliminating the CPU hotspot, a fix now merged into Redis 6.2.2.

vivo Internet Technology

Oct 13, 2021

Performance Analysis and Optimization of Redis Cluster CLUSTER SLOTS Command

Background : In production environments, Redis clusters with a large number of nodes (100+) are often expanded to meet growing traffic. After expansion, some services report increased latency, especially those that rely on real‑time reads from Redis (e.g., model inference).

Problem description : A specific Redis cluster was expanded, and the business client (hiredis‑vip) observed higher latency for MGET operations. The cluster runs Redis 3.x/4.x, with over 100 master nodes and many clients (Jedis, hiredis‑vip).

Observed symptoms :

Bandwidth is not saturated.

CPU usage spikes dramatically during the issue.

OPS and CPU load are out of phase, suggesting an indirect relationship.

Perf‑top analysis identified the function clusterReplyMultiBulkSlots consuming ~52% CPU.

Root‑cause investigation :

During slot migration, the CLUSTER SLOTS command is invoked frequently by clients handling MOVED errors.

The CLUSTER SLOTS implementation traverses every master node and every slot (16384), leading to O(number_of_masters × number_of_slots) complexity.

Both hiredis‑vip and Jedis update their slot topology by re‑executing CLUSTER SLOTS after a MOVED response, causing a cascade of heavy commands in large clusters.

Code excerpt (original implementation) :

void clusterReplyMultiBulkSlots(client *c) {
    /* Format: 1) start slot 2) end slot 3) master IP/port/ID 4) replica IP/port/ID ... */
    int num_masters = 0;
    void *slot_replylen = addDeferredMultiBulkLength(c);
    dictIterator *di = dictGetSafeIterator(server.cluster->nodes);
    while ((de = dictNext(di)) != NULL) {
        clusterNode *node = dictGetVal(de);
        if (!nodeIsMaster(node) || node->numslots == 0) continue;
        for (int j = 0; j < CLUSTER_SLOTS; j++) {
            int bit, i;
            if ((bit = clusterNodeGetSlotBit(node,j)) != 0) {
                if (start == -1) start = j;
            }
            if (start != -1 && (!bit || j == CLUSTER_SLOTS-1)) {
                // build reply for a continuous slot range
                // ... (omitted for brevity) ...
                start = -1;
                num_masters++;
            }
        }
    }
    dictReleaseIterator(di);
    setDeferredMultiBulkLength(c, slot_replylen, num_masters);
}

Bitmap logic (used to test slot ownership):

int clusterNodeGetSlotBit(clusterNode *n, int slot) {
    return bitmapTestBit(n->slots, slot);
}

int bitmapTestBit(unsigned char *bitmap, int pos) {
    off_t byte = pos/8;
    int bit = pos & 7;
    return (bitmap[byte] & (1<<bit)) != 0;
}

Client handling of MOVED :

hiredis‑vip calls cluster_update_route_by_addr, which triggers a fresh CLUSTER SLOTS request.

Jedis invokes renewSlotCache, also issuing CLUSTER SLOTS to refresh topology.

Both clients therefore generate additional CLUSTER SLOTS traffic during slot migration.

Optimization proposal :

Iterate directly over server.cluster->slots, which already stores a pointer to the owning node for each slot.

Detect continuous ranges of slots belonging to the same node and emit a single reply for each range, reducing the algorithmic complexity to O(total_slots).

Optimized implementation (simplified) :

void clusterReplyMultiBulkSlots(client *c) {
    int num_masters = 0, start = -1;
    void *slot_replylen = addReplyDeferredLen(c);
    clusterNode *n = NULL;
    for (int i = 0; i <= CLUSTER_SLOTS; i++) {
        if (n == NULL) {
            if (i == CLUSTER_SLOTS) break;
            n = server.cluster->slots[i];
            start = i;
            continue;
        }
        if (i == CLUSTER_SLOTS || n != server.cluster->slots[i]) {
            addNodeReplyForClusterSlot(c, n, start, i-1);
            num_masters++;
            if (i == CLUSTER_SLOTS) break;
            n = server.cluster->slots[i];
            start = i;
        }
    }
    setDeferredArrayLen(c, slot_replylen, num_masters);
}

Results :

CPU usage of the CLUSTER SLOTS command dropped from ~52% to a negligible level.

Execution time decreased from ~2061 µs to ~168 µs (≈8.2% of the original).

Performance gains were validated with perf flame graphs and benchmark logs.

Conclusion : The original clusterReplyMultiBulkSlots implementation has a performance defect in large Redis clusters. By leveraging the existing server.cluster->slots array, the command’s complexity is reduced, leading to significant CPU and latency improvements. The fix has been merged into Redis 6.2.2.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java performance Optimization Redis C#cluster perf

Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.