Databases 24 min read

Why Did Redis Memory Spike 10×? Uncover the Hidden Config Mistake

A sudden Redis memory surge from 2 GB to 20 GB was traced to a misconfigured list-compress-depth parameter, revealing how uncompressed lists and queue backlogs can cause ten‑fold memory growth, and outlining step‑by‑step diagnostics, compression fixes, and long‑term optimization strategies.

Ops Community

Oct 1, 2025

Why Did Redis Memory Spike 10×? Uncover the Hidden Config Mistake

Introduction

On a Monday morning a monitoring alert reported Redis memory usage at 95% and nearing OOM. The memory, normally around 2 GB, had exploded to 20 GB over the weekend—a ten‑fold increase that threatened server stability. After intensive investigation the root cause turned out to be a single overlooked configuration parameter.

Redis Memory Management Mechanism

Components of Redis Memory Usage

Dataset : actual key‑value data stored.

Replication Buffer : data buffered during master‑slave replication.

Client Buffer : I/O buffers for client connections.

AOF Buffer : write buffer for AOF persistence.

Memory Fragmentation : fragmentation caused by allocation algorithms.

Redis Internal Overhead : metadata, expiration dictionaries, etc.

Example of memory statistics: redis-cli info memory Key metrics:

# Memory
used_memory:2147483648          # 2 GB allocated
used_memory_human:2.00G
used_memory_rss:2684354560      # 2.5 GB RSS
used_memory_peak:3221225472
used_memory_peak_human:3.00G
used_memory_overhead:134217728  # internal overhead
used_memory_dataset:2013265920  # dataset size
mem_fragmentation_ratio:1.25

Redis Eviction Policies

noeviction (default): no eviction, writes return error.

allkeys-lru : evict least recently used keys.

allkeys-lfu : evict least frequently used keys (Redis 4.0+).

allkeys-random : random eviction.

volatile-lru : evict LRU among keys with TTL.

volatile-lfu : evict LFU among keys with TTL.

volatile-random : random eviction among keys with TTL.

volatile-ttl : evict keys with shortest TTL.

Common Causes of Memory Surge

Massive data writes (business growth, imports, cache warm‑up).

Large‑key problem (single key consumes too much memory).

Expiration policy failure (keys missing TTL).

Persistence configuration issues (fork causing copy‑on‑write memory duplication).

Client buffer overflow (slow clients, pub/sub backlog).

High memory fragmentation (frequent add/delete operations).

Misconfigured parameters (the focus of this case).

Full Troubleshooting Process

Step 1: Quickly Verify Memory Usage

Check Redis Memory Stats

# Connect to Redis
redis-cli -h 127.0.0.1 -p 6379

# Show memory info
INFO MEMORY

# Sample output (problem instance)
used_memory:21474836480   # 20 GB
used_memory_human:20.00G
used_memory_rss:25769803776 # 24 GB
maxmemory:0               # ❌ No maxmemory limit set
maxmemory_policy:noeviction
mem_fragmentation_ratio:1.20

Key Findings: maxmemory:0 – no memory limit, Redis can consume unlimited memory.

Actual usage reached 20 GB, far beyond the normal 2 GB.

Check System Memory

free -h
#               total   used   free   shared  buff/cache  available
# Mem:          31Gi   28Gi   500Mi   100Mi   2.5Gi       2.0Gi
# Swap:         4.0Gi  2.0Gi   2.0Gi

System memory was nearly exhausted and swap started being used.

Check Number of Keys

# Total keys
INFO keyspace
# db0:keys=1234567,expires=23456,avg_ttl=3600000

# Compared with normal (≈1 M keys) the count grew 23% while memory grew 900%.

Step 2: Analyze Large Keys

First suspect large keys.

Scan for big keys

# Scan entire DB for big keys
redis-cli --bigkeys

# Sample output
-------- summary -------
Sampled 1234567 keys in the keyspace!
Total key length in bytes is 98765432
Biggest string found 'user:detail:12345' has 10485760 bytes
Biggest list found 'queue:messages' has 123456 items
Biggest hash found 'session:data:abc' has 234567 fields
Biggest zset found 'rank:global' has 100000 members

Note: --bigkeys is sampling and may miss some keys; running it in production can affect performance.

Custom script for precise scanning

# Scan with SCAN and check serialized length
redis-cli --scan --pattern '*' | while read key; do
    size=$(redis-cli DEBUG OBJECT "$key" | grep -oP '(?<=serializedlength:)\d+')
    if [ "$size" -gt 10485760 ]; then
        echo "$key: $size bytes"
    fi
done

Result: No single key larger than 10 MB, insufficient to explain a ten‑fold increase.

Step 3: Examine Key‑Type Distribution

Analyze RDB file with rdb‑tools

# Generate RDB snapshot
redis-cli BGSAVE

# Wait for snapshot to finish
redis-cli INFO persistence | grep rdb_bgsave_in_progress

# Analyze snapshot
rdb --command memory /var/lib/redis/dump.rdb > memory_report.csv

# Sample of memory_report.csv
database,type,key,size_in_bytes,encoding,num_elements,len_largest_element
0,list,queue:messages,52428800,linkedlist,1000000,52
0,hash,session:*,10485760,hashtable,50000,200
0,string,cache:*,5242880,raw,N/A,5242880

Key Findings:

Large number of list keys, some with over 1 M elements.

Lists consume about 15 GB, roughly 75 % of total memory.

Step 4: Locate Problematic Keys

Find problematic list keys

# Find all list keys and their lengths
redis-cli --scan --pattern '*' | while read key; do
    type=$(redis-cli TYPE "$key" | tr -d '\r')
    if [ "$type" = "list" ]; then
        len=$(redis-cli LLEN "$key")
        if [ $len -gt 100000 ]; then
            echo "$key: $len items"
        fi
    fi
done

# Sample output
# queue:email:pending: 2345678 items
# queue:sms:pending: 1876543 items
# queue:notification:pending: 987654 items

Three queue keys together hold over 5 million messages.

Analyze queue backlog

# Sample queue content
redis-cli LRANGE queue:email:pending 0 10

# Findings: business message queues

Consumer is a Python script run by cron every minute; many instances were running concurrently, causing backlog.

Step 5: Identify Fatal Configuration

Inspect Redis configuration

# Get all config
redis-cli CONFIG GET '*'

# Focus on list‑related config
redis-cli CONFIG GET '*list*'

# Output (relevant part)
1) "list-max-ziplist-entries"
2) "512"   # ✅ default
3) "list-max-ziplist-value"
4) "64"    # ✅ default
5) "list-compress-depth"
6) "0"     # ❌ critical issue!

Explanation of parameters: list-max-ziplist-entries: use ziplist encoding when list length < value. list-max-ziplist-value: use ziplist when element size < value. list-compress-depth (Redis 3.2+): depth of compression for quicklist nodes. 0 disables compression; 1 compresses all nodes except the head and tail.

With list-compress-depth=0 all list nodes remain uncompressed, causing huge memory consumption.

Memory calculation

Assuming 500 0000 messages, average 50 bytes each:

Uncompressed: 500 0000 × 50 B ≈ 250 MB × 4 (list node overhead) ≈ 1 GB.

Compressed: reduces to ~30‑40 % of that, ~300‑400 MB.

Actual usage was 15 GB, indicating both lack of compression and massive queue size.

Check quicklist internals

# Check encoding of a list key
redis-cli OBJECT ENCODING queue:email:pending
# "quicklist"

# Detailed quicklist info
redis-cli DEBUG OBJECT queue:email:pending
# ... ql_nodes:2300 ql_compressed:0

Quicklist has 2300 nodes, none compressed.

Historical configuration change

# List config file modification history
ls -lt /etc/redis/redis.conf*
# ...
# diff between current and backup
diff /etc/redis/redis.conf /etc/redis/redis.conf.bak
# Shows:
# - list-compress-depth 0
# + list-compress-depth 1

On Sep 27 the parameter was changed from 1 to 0 and Redis was restarted, after which the weekend traffic caused the memory explosion.

Solution and Optimization

Emergency Steps

1. Enable list compression immediately

# Online config change (no restart needed)
redis-cli CONFIG SET list-compress-depth 1
redis-cli CONFIG REWRITE   # Persist to file

Note: Existing lists are not automatically recompressed.

2. Trigger recompression of existing lists

Option A – read‑rewrite via RPOPLPUSH (may change order):

#!/usr/bin/env python3
import redis
r = redis.Redis(host='127.0.0.1', port=6379, db=0)
for key in r.scan_iter(match='*', count=100):
    if r.type(key) == b'list':
        length = r.llen(key)
        if length > 1000:
            temp = f"_temp_{key.decode()}"
            r.rename(key, temp)
            for _ in range(length):
                r.rpoplpush(temp, key)
            r.delete(temp)

Option B – safe method preserving order (requires extra memory):

for key in r.scan_iter(match='queue:*'):
    if r.type(key) == b'list':
        length = r.llen(key)
        if length > 1000:
            items = r.lrange(key, 0, -1)
            r.delete(key)
            r.rpush(key, *items)

3. Resolve queue backlog

# Stop all consumer processes
pkill -f email_worker.py

# Investigate slow consumer (e.g., email server latency)
# Optimize by increasing concurrency, batching, adding retries
# Temporary measure: launch more consumer instances
for i in {1..10}; do nohup /opt/scripts/email_worker.py &

Long‑Term Optimization

1. Redis configuration tuning

# /etc/redis/redis.conf
maxmemory 16gb                     # reserve ~20% for OS
maxmemory-policy allkeys-lru       # eviction when limit reached
list-compress-depth 1               # enable compression
list-max-ziplist-entries 512
list-max-ziplist-value 64
# Other structures
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
# Persistence (avoid fork memory blow‑up)
save 900 1
save 300 10
save 60 10000
appendonly yes
appendfsync everysec
no-appendfsync-on-rewrite yes
# Client output buffer limits
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60

2. Monitoring and alerting

# Prometheus alert rules
- alert: RedisMemoryHigh
  expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.8
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Redis memory usage exceeds 80%"

- alert: RedisMemoryGrowthFast
  expr: rate(redis_memory_used_bytes[10m]) > 10485760
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "Redis memory growing too fast"

- alert: RedisListTooLong
  expr: redis_list_length{key=~"queue:.*"} > 100000
  for: 10m
  labels:
    severity: warning
  annotations:
    summary: "Queue backlog detected"

3. Queue architecture improvement

Consider moving heavy asynchronous workloads to dedicated message‑queue systems such as RabbitMQ, Kafka, or Redis Streams.

# Redis Stream example (Python)
import redis
r = redis.Redis(host='localhost', port=6379, db=0)
# Producer
r.xadd('stream:email', {'to':'[email protected]','subject':'Hello','body':'World'})
# Consumer group
r.xgroup_create('stream:email', 'email_workers', mkstream=True)
while True:
    msgs = r.xreadgroup('email_workers','worker1',{'stream:email': '>'}, count=10, block=1000)
    for stream, entries in msgs:
        for msg_id, data in entries:
            # process email
            r.xack('stream:email','email_workers',msg_id)

Key Takeaways

Configuration matters: list-compress-depth disabled compression caused massive memory waste.

Effective monitoring catches abnormal memory growth early.

Root‑cause analysis should combine config review, data‑structure inspection, and business‑logic profiling.

Optimization must address both immediate fixes and long‑term architectural improvements.

By correcting the misconfiguration, recompressing existing lists, and improving consumer throughput, the Redis instance returned to stable memory usage and the risk of OOM was eliminated.

memory management Redis Configuration Troubleshooting List Compression

Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Introduction

Redis Memory Management Mechanism

Components of Redis Memory Usage

Redis Eviction Policies

Common Causes of Memory Surge

Full Troubleshooting Process

Step 1: Quickly Verify Memory Usage

Check Redis Memory Stats

Check System Memory

Check Number of Keys

Step 2: Analyze Large Keys

Scan for big keys

Custom script for precise scanning

Step 3: Examine Key‑Type Distribution

Analyze RDB file with rdb‑tools

Step 4: Locate Problematic Keys

Find problematic list keys

Analyze queue backlog

Step 5: Identify Fatal Configuration

Inspect Redis configuration

Memory calculation

Check quicklist internals

Historical configuration change

Solution and Optimization

Emergency Steps

1. Enable list compression immediately

2. Trigger recompression of existing lists

3. Resolve queue backlog

Long‑Term Optimization

1. Redis configuration tuning

2. Monitoring and alerting

3. Queue architecture improvement

Key Takeaways

Ops Community

How this landed with the community

Was this worth your time?

0 Comments

Step 1: Quickly Verify Memory Usage

Step 2: Analyze Large Keys

Step 3: Examine Key‑Type Distribution

Step 4: Locate Problematic Keys

Step 5: Identify Fatal Configuration