Databases 9 min read

Why Did Redis Keys Vanish at 2 AM Despite No Memory Alerts?

A production incident showed Redis keys disappearing at 2 AM without any memory alarms; deep analysis revealed a short‑term memory spike caused by a surge in GET requests, client‑output‑buffer‑limit growth, and LRU eviction, leading to practical mitigation steps.

dbaplus Community
dbaplus Community
dbaplus Community
Why Did Redis Keys Vanish at 2 AM Despite No Memory Alerts?

Problem Description

At 02:00 the application reported errors because it could not retrieve Redis keys. Monitoring showed no memory alarm even though memory usage briefly spiked, and the Redis memory usage chart stayed low overall.

Initial Hypotheses

Key expiration was suspected because many keys had similar TTLs (~5 days).

Checked eviction policy (volatile‑lru) and TTL values; they did not explain the sudden loss.

Investigation Steps

Verified that keys were being evicted, not simply expired.

Confirmed no large data writes during the incident.

Captured packets 20 minutes before and after the incident.

Packet analysis showed a sharp increase in GET requests (from ~270 k/min to ~700 k/min) while SET requests grew only slightly.

IP‑source statistics indicated a handful of client IPs generating the bulk of the traffic.

Root Cause Analysis

The memory spike triggered the configured volatile‑lru eviction policy, causing key removal. However, the real driver was the client-output-buffer-limit metric soaring, indicating that Redis was buffering a massive amount of reply data.

Redis allocates a 16 KB static buffer per client and a linked list for larger replies. Under normal conditions, GET replies fit in the static buffer. When many GET commands are pipelined, Redis temporarily stores each reply in the linked list before sending them out. If the network cannot drain the replies fast enough, the reply list grows, consuming client‑side memory and eventually exhausting the instance’s maxmemory limit.

Redis Memory Composition

maxmemory includes:

Business data stored in Redis.

Client input/output buffers.

Replication backlog.

AOF buffer.

Other internal overhead.

Mitigation Measures

Temporarily increase cluster memory to avoid immediate out‑of‑memory.

Reduce client-output-buffer-limit to prevent buffer blow‑up.

Apply rate‑limiting and traffic‑shaping to smooth request spikes.

Introduce a fallback to MySQL when a key is evicted, preventing business errors.

Root Solution

Optimize business logic to disperse requests over time, avoiding concentrated bursts that overwhelm Redis.

Takeaways

Even with good Redis performance, it is essential to distribute load, monitor client output buffers, and configure eviction policies and TTLs appropriately to maintain service availability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Redistroubleshootingmemorykey evictionclient-output-buffer-limit
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.