Why Did Redis Keys Suddenly Disappear? A Deep Dive into Memory Exhaustion and Client Buffer Overflow
This article analyzes a production incident where Redis failed to retrieve keys at 2 AM, tracing the root cause to a short‑term memory write‑full condition caused by massive GET request bursts that overflowed client output buffers, and outlines both emergency fixes and long‑term mitigations.
Problem Description
The service reported errors at 2 AM because Redis could not fetch certain keys. Initial checks ruled out key eviction due to memory‑full alerts, as no memory‑usage alarms were triggered.
Monitoring graphs showed a slight increase in Redis memory usage but overall low utilization, while a spike in key expirations suggested many keys were set with TTLs that expired simultaneously.
Further investigation revealed no abnormal Redis error logs.
Problem Localization
Based on monitoring, the team first suspected expired keys, then considered recent feature releases, but business feedback indicated the issue was not caused by TTL expiration. Subsequent analysis focused on memory policies.
Key observations:
TTL values were still far from expiration (≈5 days).
Keys were being evicted, indicating a memory pressure scenario.
Using config get 'maxmemory-policy' the policy was confirmed as volatile-lru. TTL checks showed large values, confirming expiration was not the trigger.
Memory write‑full events lasted about 10 minutes, during which the client_longest_output_list metric spiked, hinting at output buffer buildup.
Packet captures showed a surge in GET requests (from ~270 k to ~700 k per minute) with a modest increase in SET commands.
IP analysis identified several internal IPs generating the bulk of the traffic.
Mechanism Analysis
Redis allocates a static 16 KB buffer ( #define REDIS_REPLY_CHUNK_BYTES (16*1024)) per client and a dynamic reply list. Normal commands fit within the static buffer; larger results are stored in the reply list.
When many commands are pipelined, Redis temporarily stores their results in the reply list before sending them out. If the network cannot keep up, the reply list grows, consuming client‑side memory.
The observed spike in client-output-buffer-limit confirmed that output buffers were filling due to massive GET responses, leading to memory pressure and triggering the volatile-lru eviction policy.
Solution
Emergency fixes
Temporarily increase Redis cluster memory to avoid immediate write‑full conditions.
Limit client-output-buffer-limit to prevent buffer overflow.
Apply rate limiting and traffic shaping to smooth request spikes.
Introduce a fallback to MySQL when Redis keys are evicted, preventing business errors.
Root cause mitigation
Optimized business logic to disperse requests, avoiding concentrated load on Redis.
Conclusion
The incident was caused by a sudden surge in read requests that filled Redis client output buffers, causing memory to reach its max limit and triggering key eviction. Properly sizing memory, configuring client buffer limits, and smoothing traffic are essential to prevent similar outages.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
