Databases 21 min read

How to Detect Redis Big Keys in Real Time with Zero Code Changes

This article presents a lightweight, non‑intrusive eBPF‑based method for instantly identifying Redis big‑key operations, explains the underlying kernel and user‑space implementation, provides complete code samples, and evaluates performance before and after optimization.

Architect

Feb 4, 2025

How to Detect Redis Big Keys in Real Time with Zero Code Changes

Background

Redis performance degrades when large keys (big‑keys) are frequently read or written, causing network saturation, memory pressure, master node blockage, and slow client responses. Traditional approaches rely on periodic RDB snapshots, which miss short‑lived big‑keys and lack real‑time visibility.

Why Manage Redis Big Keys?

Excessive traffic fills the network.

Response size can exceed maxmemory and trigger eviction.

Master node may become blocked, causing unintended failover.

Client buffers grow, slowing down processing.

Traditional Real‑Time Detection Methods

Common solutions include scheduled RDB analysis or custom Redis modules, both of which suffer from latency, missed short‑lived keys, or high overhead.

Conclusion: Periodic RDB analysis is not real‑time and cannot prevent sudden big‑key queries.

eBPF‑Based Real‑Time Detection (Zero Code Change)

The solution uses eBPF uprobes to attach to Redis internal functions without modifying Redis source code. It captures three key pieces of information for each command:

Client IP address

Command arguments

Response byte size

If the response size exceeds a configurable threshold ( BIGKEY_THRESHOLD_BYTES), the event is reported to a user‑space program.

Kernel‑Space eBPF Program

Three probes are used:

call_entry : stores the client pointer in a map and clears the byte counter.

_addReplyToBufferOrList_entry : accumulates the response length for each call to _addReplyToBufferOrList.

call_exit (uretprobe): reads the accumulated byte count, checks the threshold, retrieves the client IP via socket fd, extracts command arguments, and pushes a bigkey_log event to a perf‑event map.

SEC("uprobe//root/workspace/redis-6.2.13/src/redis-server:call")
int BPF_UPROBE(callEntry) {
    u32 key = 0;
    bpf_map_delete_elem(&reply_bytes_map, &key);
    struct client_t *client = PT_REGS_PARM1(ctx);
    if (client) {
        bpf_map_update_elem(&call_args_map, &key, &client, BPF_ANY);
    }
    return BPF_OK;
}

SEC("uprobe//root/workspace/redis-6.2.13/src/redis-server:_addReplyToBufferOrList")
int BPF_UPROBE(_addReplyToBufferOrList_entry) {
    size_t len = (size_t)PT_REGS_PARM3(ctx);
    u32 key = 0;
    u64 *reply_bytes = bpf_map_lookup_elem(&reply_bytes_map, &key);
    u64 sum = len;
    if (reply_bytes) sum += *reply_bytes;
    bpf_map_update_elem(&reply_bytes_map, &key, &sum, BPF_ANY);
    return BPF_OK;
}

SEC("uretprobe//root/workspace/redis-6.2.13/src/redis-server:call")
int BPF_URETPROBE(call_exit) {
    u32 key = 0;
    u64 *p_bytes = bpf_map_lookup_elem(&reply_bytes_map, &key);
    u64 bytes = *p_bytes;
    if (bytes > BIGKEY_THRESHOLD_BYTES) {
        struct client_t **p_client = bpf_map_lookup_elem(&call_args_map, &key);
        struct client_t *client = *p_client;
        int fd = 0;
        get_client_fd(client, &fd);
        // extract argv and argc, truncate if needed
        // fill bigkey_log struct and emit
        bpf_perf_event_output(ctx, &bigkey_log_map, BPF_F_CURRENT_CPU, &evt, sizeof(evt));
    }
    return BPF_OK;
}

User‑Space Collector (Go)

The Go program uses github.com/cilium/ebpf to load the eBPF object, read events from bigkey_log_map, resolve the client IP from the socket fd, format the command arguments, and print a line such as:

ip: 127.0.0.1:56762, args: COMMAND DOCS , bytes: 213589

Key steps:

Deserialize the perf‑event payload into a BigkeyLog struct.

Convert the argument list to a space‑separated string.

Map the socket fd to a remote IP using /proc/net/tcp.

Log the result to stdout.

Performance Evaluation

Running the detector with a low threshold (1024 bytes) correctly identifies large responses. The initial implementation incurs a heavy performance penalty: throughput drops from ~27 k ops/s to ~269 ops/s and average latency rises from ~1 ms to ~185 ms because the uprobe on _addReplyToBufferOrList fires hundreds of times per command.

Optimized Implementation

Instead of counting bytes on every _addReplyToBufferOrList call, the second version records the buffer position ( bufpos) and the reply‑list state at call_entry , then computes the total response size once at call_exit by subtracting the initial state from the final state. This eliminates the high‑frequency uprobe and reduces context switches.

static __always_inline void fillPosData(struct client_data_pos *arg, struct client_t *client) {
    int bufpos = 0;
    bpf_probe_read_user(&bufpos, sizeof(bufpos), (void *)client + BUFPOS_OFFSET);
    arg->buf_bos = bufpos;
    // read reply list length and tail used fields
    // store in arg->list_idx and arg->list_offset
}

After the optimization, throughput improves to ~23 k ops/s and average latency drops to ~1.8 ms, a 99 % reduction.

Conclusion

eBPF uprobe tracing can provide millisecond‑level, zero‑code‑change detection of Redis big‑key operations, but careful probe placement is essential to avoid severe performance degradation. Further gains are possible by moving probes to user‑space with projects like bpftime, which eliminates kernel‑space context switches.

https://github.com/hengyoush/redis-bigkey-detector

Additional references:

https://ebpf.io/what-is-ebpf/

https://eunomia.dev/zh/bpftime/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Redis Go performance monitoring eBPF uprobe big key detection

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.