Why Did Redis Memory Spike 10×? Uncover the Hidden Config Mistake
A sudden Redis memory surge from 2 GB to 20 GB was traced to a misconfigured list-compress-depth parameter, revealing how uncompressed lists and queue backlogs can cause ten‑fold memory growth, and outlining step‑by‑step diagnostics, compression fixes, and long‑term optimization strategies.
Introduction
On a Monday morning a monitoring alert reported Redis memory usage at 95% and nearing OOM. The memory, normally around 2 GB, had exploded to 20 GB over the weekend—a ten‑fold increase that threatened server stability. After intensive investigation the root cause turned out to be a single overlooked configuration parameter.
Redis Memory Management Mechanism
Components of Redis Memory Usage
Dataset : actual key‑value data stored.
Replication Buffer : data buffered during master‑slave replication.
Client Buffer : I/O buffers for client connections.
AOF Buffer : write buffer for AOF persistence.
Memory Fragmentation : fragmentation caused by allocation algorithms.
Redis Internal Overhead : metadata, expiration dictionaries, etc.
Example of memory statistics: redis-cli info memory Key metrics:
# Memory
used_memory:2147483648 # 2 GB allocated
used_memory_human:2.00G
used_memory_rss:2684354560 # 2.5 GB RSS
used_memory_peak:3221225472
used_memory_peak_human:3.00G
used_memory_overhead:134217728 # internal overhead
used_memory_dataset:2013265920 # dataset size
mem_fragmentation_ratio:1.25Redis Eviction Policies
noeviction (default): no eviction, writes return error.
allkeys-lru : evict least recently used keys.
allkeys-lfu : evict least frequently used keys (Redis 4.0+).
allkeys-random : random eviction.
volatile-lru : evict LRU among keys with TTL.
volatile-lfu : evict LFU among keys with TTL.
volatile-random : random eviction among keys with TTL.
volatile-ttl : evict keys with shortest TTL.
Common Causes of Memory Surge
Massive data writes (business growth, imports, cache warm‑up).
Large‑key problem (single key consumes too much memory).
Expiration policy failure (keys missing TTL).
Persistence configuration issues (fork causing copy‑on‑write memory duplication).
Client buffer overflow (slow clients, pub/sub backlog).
High memory fragmentation (frequent add/delete operations).
Misconfigured parameters (the focus of this case).
Full Troubleshooting Process
Step 1: Quickly Verify Memory Usage
Check Redis Memory Stats
# Connect to Redis
redis-cli -h 127.0.0.1 -p 6379
# Show memory info
INFO MEMORY
# Sample output (problem instance)
used_memory:21474836480 # 20 GB
used_memory_human:20.00G
used_memory_rss:25769803776 # 24 GB
maxmemory:0 # ❌ No maxmemory limit set
maxmemory_policy:noeviction
mem_fragmentation_ratio:1.20Key Findings: maxmemory:0 – no memory limit, Redis can consume unlimited memory.
Actual usage reached 20 GB, far beyond the normal 2 GB.
Check System Memory
free -h
# total used free shared buff/cache available
# Mem: 31Gi 28Gi 500Mi 100Mi 2.5Gi 2.0Gi
# Swap: 4.0Gi 2.0Gi 2.0GiSystem memory was nearly exhausted and swap started being used.
Check Number of Keys
# Total keys
INFO keyspace
# db0:keys=1234567,expires=23456,avg_ttl=3600000
# Compared with normal (≈1 M keys) the count grew 23% while memory grew 900%.Step 2: Analyze Large Keys
First suspect large keys.
Scan for big keys
# Scan entire DB for big keys
redis-cli --bigkeys
# Sample output
-------- summary -------
Sampled 1234567 keys in the keyspace!
Total key length in bytes is 98765432
Biggest string found 'user:detail:12345' has 10485760 bytes
Biggest list found 'queue:messages' has 123456 items
Biggest hash found 'session:data:abc' has 234567 fields
Biggest zset found 'rank:global' has 100000 membersNote: --bigkeys is sampling and may miss some keys; running it in production can affect performance.
Custom script for precise scanning
# Scan with SCAN and check serialized length
redis-cli --scan --pattern '*' | while read key; do
size=$(redis-cli DEBUG OBJECT "$key" | grep -oP '(?<=serializedlength:)\d+')
if [ "$size" -gt 10485760 ]; then
echo "$key: $size bytes"
fi
doneResult: No single key larger than 10 MB, insufficient to explain a ten‑fold increase.
Step 3: Examine Key‑Type Distribution
Analyze RDB file with rdb‑tools
# Generate RDB snapshot
redis-cli BGSAVE
# Wait for snapshot to finish
redis-cli INFO persistence | grep rdb_bgsave_in_progress
# Analyze snapshot
rdb --command memory /var/lib/redis/dump.rdb > memory_report.csv
# Sample of memory_report.csv
database,type,key,size_in_bytes,encoding,num_elements,len_largest_element
0,list,queue:messages,52428800,linkedlist,1000000,52
0,hash,session:*,10485760,hashtable,50000,200
0,string,cache:*,5242880,raw,N/A,5242880Key Findings:
Large number of list keys, some with over 1 M elements.
Lists consume about 15 GB, roughly 75 % of total memory.
Step 4: Locate Problematic Keys
Find problematic list keys
# Find all list keys and their lengths
redis-cli --scan --pattern '*' | while read key; do
type=$(redis-cli TYPE "$key" | tr -d '\r')
if [ "$type" = "list" ]; then
len=$(redis-cli LLEN "$key")
if [ $len -gt 100000 ]; then
echo "$key: $len items"
fi
fi
done
# Sample output
# queue:email:pending: 2345678 items
# queue:sms:pending: 1876543 items
# queue:notification:pending: 987654 itemsThree queue keys together hold over 5 million messages.
Analyze queue backlog
# Sample queue content
redis-cli LRANGE queue:email:pending 0 10
# Findings: business message queuesConsumer is a Python script run by cron every minute; many instances were running concurrently, causing backlog.
Step 5: Identify Fatal Configuration
Inspect Redis configuration
# Get all config
redis-cli CONFIG GET '*'
# Focus on list‑related config
redis-cli CONFIG GET '*list*'
# Output (relevant part)
1) "list-max-ziplist-entries"
2) "512" # ✅ default
3) "list-max-ziplist-value"
4) "64" # ✅ default
5) "list-compress-depth"
6) "0" # ❌ critical issue!Explanation of parameters: list-max-ziplist-entries: use ziplist encoding when list length < value. list-max-ziplist-value: use ziplist when element size < value. list-compress-depth (Redis 3.2+): depth of compression for quicklist nodes. 0 disables compression; 1 compresses all nodes except the head and tail.
With list-compress-depth=0 all list nodes remain uncompressed, causing huge memory consumption.
Memory calculation
Assuming 500 0000 messages, average 50 bytes each:
Uncompressed: 500 0000 × 50 B ≈ 250 MB × 4 (list node overhead) ≈ 1 GB.
Compressed: reduces to ~30‑40 % of that, ~300‑400 MB.
Actual usage was 15 GB, indicating both lack of compression and massive queue size.
Check quicklist internals
# Check encoding of a list key
redis-cli OBJECT ENCODING queue:email:pending
# "quicklist"
# Detailed quicklist info
redis-cli DEBUG OBJECT queue:email:pending
# ... ql_nodes:2300 ql_compressed:0Quicklist has 2300 nodes, none compressed.
Historical configuration change
# List config file modification history
ls -lt /etc/redis/redis.conf*
# ...
# diff between current and backup
diff /etc/redis/redis.conf /etc/redis/redis.conf.bak
# Shows:
# - list-compress-depth 0
# + list-compress-depth 1On Sep 27 the parameter was changed from 1 to 0 and Redis was restarted, after which the weekend traffic caused the memory explosion.
Solution and Optimization
Emergency Steps
1. Enable list compression immediately
# Online config change (no restart needed)
redis-cli CONFIG SET list-compress-depth 1
redis-cli CONFIG REWRITE # Persist to fileNote: Existing lists are not automatically recompressed.
2. Trigger recompression of existing lists
Option A – read‑rewrite via RPOPLPUSH (may change order):
#!/usr/bin/env python3
import redis
r = redis.Redis(host='127.0.0.1', port=6379, db=0)
for key in r.scan_iter(match='*', count=100):
if r.type(key) == b'list':
length = r.llen(key)
if length > 1000:
temp = f"_temp_{key.decode()}"
r.rename(key, temp)
for _ in range(length):
r.rpoplpush(temp, key)
r.delete(temp)Option B – safe method preserving order (requires extra memory):
for key in r.scan_iter(match='queue:*'):
if r.type(key) == b'list':
length = r.llen(key)
if length > 1000:
items = r.lrange(key, 0, -1)
r.delete(key)
r.rpush(key, *items)3. Resolve queue backlog
# Stop all consumer processes
pkill -f email_worker.py
# Investigate slow consumer (e.g., email server latency)
# Optimize by increasing concurrency, batching, adding retries
# Temporary measure: launch more consumer instances
for i in {1..10}; do nohup /opt/scripts/email_worker.py &Long‑Term Optimization
1. Redis configuration tuning
# /etc/redis/redis.conf
maxmemory 16gb # reserve ~20% for OS
maxmemory-policy allkeys-lru # eviction when limit reached
list-compress-depth 1 # enable compression
list-max-ziplist-entries 512
list-max-ziplist-value 64
# Other structures
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
# Persistence (avoid fork memory blow‑up)
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec
no-appendfsync-on-rewrite yes
# Client output buffer limits
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 602. Monitoring and alerting
# Prometheus alert rules
- alert: RedisMemoryHigh
expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Redis memory usage exceeds 80%"
- alert: RedisMemoryGrowthFast
expr: rate(redis_memory_used_bytes[10m]) > 10485760
for: 5m
labels:
severity: warning
annotations:
summary: "Redis memory growing too fast"
- alert: RedisListTooLong
expr: redis_list_length{key=~"queue:.*"} > 100000
for: 10m
labels:
severity: warning
annotations:
summary: "Queue backlog detected"3. Queue architecture improvement
Consider moving heavy asynchronous workloads to dedicated message‑queue systems such as RabbitMQ, Kafka, or Redis Streams.
# Redis Stream example (Python)
import redis
r = redis.Redis(host='localhost', port=6379, db=0)
# Producer
r.xadd('stream:email', {'to':'[email protected]','subject':'Hello','body':'World'})
# Consumer group
r.xgroup_create('stream:email', 'email_workers', mkstream=True)
while True:
msgs = r.xreadgroup('email_workers','worker1',{'stream:email': '>'}, count=10, block=1000)
for stream, entries in msgs:
for msg_id, data in entries:
# process email
r.xack('stream:email','email_workers',msg_id)Key Takeaways
Configuration matters: list-compress-depth disabled compression caused massive memory waste.
Effective monitoring catches abnormal memory growth early.
Root‑cause analysis should combine config review, data‑structure inspection, and business‑logic profiling.
Optimization must address both immediate fixes and long‑term architectural improvements.
By correcting the misconfiguration, recompressing existing lists, and improving consumer throughput, the Redis instance returned to stable memory usage and the risk of OOM was eliminated.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
