Why Did Redis Keys Vanish at 2 AM Despite No Memory Alerts?
A production incident showed Redis keys disappearing at 2 AM without any memory alarms; deep analysis revealed a short‑term memory spike caused by a surge in GET requests, client‑output‑buffer‑limit growth, and LRU eviction, leading to practical mitigation steps.
Problem Description
At 02:00 the application reported errors because it could not retrieve Redis keys. Monitoring showed no memory alarm even though memory usage briefly spiked, and the Redis memory usage chart stayed low overall.
Initial Hypotheses
Key expiration was suspected because many keys had similar TTLs (~5 days).
Checked eviction policy (volatile‑lru) and TTL values; they did not explain the sudden loss.
Investigation Steps
Verified that keys were being evicted, not simply expired.
Confirmed no large data writes during the incident.
Captured packets 20 minutes before and after the incident.
Packet analysis showed a sharp increase in GET requests (from ~270 k/min to ~700 k/min) while SET requests grew only slightly.
IP‑source statistics indicated a handful of client IPs generating the bulk of the traffic.
Root Cause Analysis
The memory spike triggered the configured volatile‑lru eviction policy, causing key removal. However, the real driver was the client-output-buffer-limit metric soaring, indicating that Redis was buffering a massive amount of reply data.
Redis allocates a 16 KB static buffer per client and a linked list for larger replies. Under normal conditions, GET replies fit in the static buffer. When many GET commands are pipelined, Redis temporarily stores each reply in the linked list before sending them out. If the network cannot drain the replies fast enough, the reply list grows, consuming client‑side memory and eventually exhausting the instance’s maxmemory limit.
Redis Memory Composition
maxmemory includes:
Business data stored in Redis.
Client input/output buffers.
Replication backlog.
AOF buffer.
Other internal overhead.
Mitigation Measures
Temporarily increase cluster memory to avoid immediate out‑of‑memory.
Reduce client-output-buffer-limit to prevent buffer blow‑up.
Apply rate‑limiting and traffic‑shaping to smooth request spikes.
Introduce a fallback to MySQL when a key is evicted, preventing business errors.
Root Solution
Optimize business logic to disperse requests over time, avoiding concentrated bursts that overwhelm Redis.
Takeaways
Even with good Redis performance, it is essential to distribute load, monitor client output buffers, and configure eviction policies and TTLs appropriately to maintain service availability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
