Why Did Redis Suddenly Evict Keys? A Deep Dive into Memory, Pipelines, and Client Buffers
This article walks through a production incident where Redis began returning missing keys, detailing the step‑by‑step diagnosis—from monitoring logs and TTL checks to discovering memory spikes caused by client‑output‑buffer‑limit overflow and pipeline traffic—followed by emergency and permanent remediation measures.
Problem Description
In production, an application service started failing at 02:00 AM because Redis could not retrieve certain keys. Initial alerts showed no memory warnings, and the Redis memory usage appeared low, but logs indicated a massive number of key expirations.
Problem Location
Checked eviction policy: volatile-lru.
Verified TTL of affected keys (≈5 days) – not expired.
Observed a short‑lived memory write‑full event (≈10 minutes) that did not trigger the three‑consecutive‑checks alarm.
Noted a sharp increase in client_longest_output_list metric, suggesting output‑buffer buildup.
Analyzed request logs: a surge in GET requests (from ~270 k/min to ~700 k/min) with a smaller rise in SET requests.
Mechanism Analysis
Redis allocates a static 16 KB buffer ( #define REDIS_REPLY_CHUNK_BYTES (16*1024)) per client and a linked list ( list *reply) for results that exceed this size. Under normal conditions, GET results fit in the static buffer. However, when many commands are pipelined, Redis temporarily stores each result in the reply list until it can flush them to the network. If the network cannot keep up, the reply list grows, consuming client‑side memory.
The client_output_buffer_limit metric spiked, confirming that the output buffers were filling faster than they could be sent, leading to memory pressure and triggering the volatile‑lru eviction policy.
Root Cause
A sudden increase in read traffic caused a massive number of GET commands to be pipelined. The server could not drain the output buffers quickly enough, causing the client_longest_output_list to rise and eventually exhausting the Redis instance’s memory, which forced key eviction.
Emergency Fixes
Temporarily increase the Redis cluster’s memory limit to avoid immediate write‑full failures.
Reduce client-output-buffer-limit values to prevent unchecked buffer growth.
Apply rate‑limiting and traffic‑shaping on the business side to smooth the request spike.
Introduce a fallback: if a key is evicted, query MySQL to prevent application errors.
Permanent Solution
Optimise the business logic to distribute requests over time, avoiding a concentrated burst of Redis reads. Adjust the eviction policy or use a non‑evicting configuration for critical data, and set appropriate TTLs to keep memory usage within safe bounds.
Conclusion
The incident was caused by a burst of read traffic that filled Redis’s client output buffers, leading to memory exhaustion and key eviction. Proper monitoring of client_longest_output_list, careful configuration of client-output-buffer-limit, and traffic throttling are essential to prevent similar issues.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
