Operations 8 min read

How to Detect and Resolve Redis Big‑Key Issues That Cause Service Timeouts

This article walks through a real‑world incident where a Redis cache big‑key caused service timeouts, explains how to identify the problem using cluster metrics, outlines detection tools and commands, and provides practical steps to delete big keys and prevent future occurrences.

dbaplus Community
dbaplus Community
dbaplus Community
How to Detect and Resolve Redis Big‑Key Issues That Cause Service Timeouts

Problem Description

At 19:44 an alert indicated that an online API's tp99 latency spiked to over 300 ms (normal 8 ms). Monitoring showed intermittent timeouts on random machines, and the cache‑dependent service had its degradation switch off. By 20:13 the root cause was identified as a single Redis big‑key exceeding 5 MB that generated continuous requests.

Observation (望)

High‑concurrency traffic stresses distributed caches like Redis. When a large key is repeatedly accessed, the affected shard’s outbound traffic surges while other shards remain normal. Redis Cluster distributes keys across 16,384 hash slots, so a big key resides on a single shard, making its traffic pattern a clear indicator.

Investigation (闻)

Key symptoms include:

One shard receives modest inbound traffic but massive outbound traffic.

Only a specific shard experiences timeouts while others are normal.

Understanding Redis Cluster’s slot allocation helps quickly pinpoint the problematic shard.

Detection Methods (问)

Several tools can scan for big keys: redis-rdb-tools: Run bgsave on the instance, then analyze the generated dump.rdb with rdb -c memory dump.rdb to list large keys. redis-cli --bigkeys: Shows the biggest keys for each data type (string, hash, list, set, zset).

Custom Python scripts that iterate over keys similarly to --bigkeys.

After locating the big key, delete it with the DEL command.

Resolution (切)

To prevent recurrence:

Avoid using Redis as a primary store for complex data structures; split large objects at design time.

Introduce validation layers before caching to reject keys exceeding a size threshold and raise alerts.

Redis uses three buffers per client: input, replication/AOF, and output. Big‑key responses can overflow the client’s input buffer, causing connection interruptions.

Output buffer overflow can also occur due to:

Large responses from big‑key requests.

Running the MONITOR command.

Improper buffer size settings.

Adjust the client‑output‑buffer limits, for example: client-output-buffer-limit normal 0 0 0 (no limit for normal clients). client-output-buffer-limit pubsub 8mb 2mb 60 (close connection if >2 MB within 60 s).

Additional preventive steps:

Avoid storing big keys.

Do not use MONITOR in production.

Set reasonable client-output-buffer-limit values.

These measures address why large keys cause client‑server link interruptions and help maintain stable Redis performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CacheredistroubleshootingBigKey
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.