Databases 26 min read

Mastering Redis Memory: From Basics to Advanced Troubleshooting

This comprehensive guide walks you through the real costs of Redis memory problems, explains the three‑layer memory architecture and five major consumers, provides a toolbox of INFO and MEMORY commands plus monitoring scripts, and offers step‑by‑step solutions for seven common issues, best‑practice optimizations, real‑world case studies, a daily checklist, and advanced techniques such as Lua scripts and smart cache warm‑up.

Raymond Ops
Raymond Ops
Raymond Ops
Mastering Redis Memory: From Basics to Advanced Troubleshooting

1. Real Cost of Redis Memory Issues

Last year an e‑commerce platform suffered a cache crash due to a Redis memory leak, losing $3 million, while a fintech company spent an extra ¥80 000 per month because of misconfigured Redis memory settings. These examples illustrate that Redis memory problems can have severe financial impact.

2. Deep Dive into Redis Memory Architecture

2.1 Three‑layer Memory Allocation

Redis memory is divided into three layers: the operating‑system memory available to the process, the Redis process memory (including data, buffers, and fragmentation), and the actual data‑structure memory where the stored values reside.

2.2 Five Major Memory Consumers

Data memory : the memory used to store key‑value pairs.

Buffer memory : client output buffers, AOF buffers, and replication buffers.

Memory fragmentation : overhead created by the allocator.

Child‑process memory : memory used during RDB or AOF rewrite.

Shared object memory : pre‑allocated integer object pools.

Understanding these components is the foundation for effective troubleshooting.

3. Diagnostic Toolbox

3.1 INFO command – First line of defense

# Get detailed memory information
redis-cli INFO memory

# Key metrics
used_memory:1073741824   # Memory allocated by Redis allocator
used_memory_human:1.00G   # Human‑readable format
used_memory_rss:1288490188   # Memory allocated by the OS
used_memory_peak:1073741824   # Peak memory usage
mem_fragmentation_ratio:1.20   # Fragmentation ratio

Expert tip: When mem_fragmentation_ratio exceeds 1.5, fragmentation is severe and requires attention.

3.2 MEMORY command – Precise pinpointing

# Check memory usage of a specific key
redis-cli MEMORY USAGE mykey

# General memory statistics
redis-cli MEMORY STATS

# Memory doctor for diagnostics
redis-cli MEMORY DOCTOR

3.3 Monitoring scripts – Automated inspection

#!/usr/bin/env python3
import redis, time, json

class RedisMemoryMonitor:
    def __init__(self, host='localhost', port=6379):
        self.r = redis.Redis(host=host, port=port)
        self.threshold = {
            'memory_usage': 0.8,   # 80% usage alert
            'fragmentation': 1.5   # Fragmentation alert threshold
        }
    def check_memory(self):
        info = self.r.info('memory')
        metrics = {
            'used_memory': info['used_memory'],
            'used_memory_rss': info['used_memory_rss'],
            'fragmentation_ratio': info['mem_fragmentation_ratio'],
            'usage_ratio': info['used_memory'] / info['maxmemory']
        }
        alerts = []
        if metrics['usage_ratio'] > self.threshold['memory_usage']:
            alerts.append(f"[ALERT] Memory usage high: {metrics['usage_ratio']:.1%}")
        if metrics['fragmentation_ratio'] > self.threshold['fragmentation']:
            alerts.append(f"[ALERT] Fragmentation ratio high: {metrics['fragmentation_ratio']:.2f}")
        return metrics, alerts
    def run(self, interval=60):
        while True:
            metrics, alerts = self.check_memory()
            for a in alerts:
                print(a)
            print(f"[INFO] {json.dumps(metrics)}")
            time.sleep(interval)

if __name__ == "__main__":
    monitor = RedisMemoryMonitor()
    monitor.run()

4. Seven Common Memory Problems and Solutions

4.1 Memory Leak – The hidden killer

Symptoms :

Continuous growth of memory usage.

No corresponding business growth.

Problem disappears after a restart.

Diagnostic steps :

# Identify large keys
redis-cli --bigkeys

# Scan key space
redis-cli --scan --pattern "*" | head -100

# Check expired keys
redis-cli DBSIZE
redis-cli INFO keyspace

Solution (example script to clean expired temporary keys):

import redis, json

def clean_expired_keys(r, batch_size=100):
    cursor = 0
    cleaned = 0
    while True:
        cursor, keys = r.scan(cursor, count=batch_size)
        for key in keys:
            ttl = r.ttl(key)
            if ttl == -1 and key.startswith(b'temp:'):
                r.delete(key)
                cleaned += 1
        if cursor == 0:
            break
    return cleaned

r = redis.Redis()
print(f"Cleaned {clean_expired_keys(r)} keys")

4.2 Big‑Key Problem – Performance killer

Impact :

Single operation blocks other requests.

Network transmission pressure spikes.

Master‑slave replication delay.

Detection :

# Scan for keys larger than 1 MB
redis-cli --bigkeys --scan

# Custom Lua scan
redis-cli eval "
local result = {}
local cursor = '0'
repeat
    local scan_result = redis.call('SCAN', cursor, 'COUNT', 100)
    cursor = scan_result[1]
    for _, key in ipairs(scan_result[2]) do
        local size = redis.call('MEMORY', 'USAGE', key)
        if size and size > 1048576 then
            table.insert(result, {key, size})
        end
    end
until cursor == '0'
return result" 0

Optimization – Split large hashes into smaller chunks:

def split_large_hash(r, key, chunk_size=1000):
    """Split a large hash into multiple small hashes"""
    data = r.hgetall(key)
    items = list(data.items())
    chunks = []
    for i in range(0, len(items), chunk_size):
        chunk_key = f"{key}:chunk:{i//chunk_size}"
        chunk_data = dict(items[i:i+chunk_size])
        r.hmset(chunk_key, chunk_data)
        chunks.append(chunk_key)
    r.sadd(f"{key}:chunks", *chunks)
    r.delete(key)
    return chunks

4.3 Memory Fragmentation – Hidden cost

Causes :

Frequent insert/delete operations.

Large swings in data size.

Allocator characteristics.

Diagnosis :

# View fragmentation ratio
redis-cli INFO memory | grep fragmentation

# Analyze allocator distribution
redis-cli MEMORY STATS | grep allocator

Remediation :

# Online defragmentation (Redis 4.0+)
redis-cli CONFIG SET activedefrag yes
redis-cli CONFIG SET active-defrag-ignore-bytes 100mb
redis-cli CONFIG SET active-defrag-threshold-lower 10

# Periodic restart (for master‑slave setups)
# Switch traffic to replica, restart master, then switch back

4.4 Buffer Overflow – Sudden crisis

Typical scenarios :

Client output buffer overflow.

Replication buffer overflow.

AOF rewrite buffer overflow.

Monitoring script :

def monitor_client_buffers(r):
    clients = r.client_list()
    dangerous = []
    for client in clients:
        info = dict(item.split('=') for item in client.split())
        omem = int(info.get('omem', 0))
        if omem > 10*1024*1024:  # 10 MB
            dangerous.append({'addr': info.get('addr'), 'omem': omem, 'cmd': info.get('cmd')})
    return dangerous

# Adjust limits
redis-cli CONFIG SET client-output-buffer-limit "normal 0 0 0"
redis-cli CONFIG SET client-output-buffer-limit "replica 256mb 64mb 60"
redis-cli CONFIG SET client-output-buffer-limit "pubsub 32mb 8mb 60"

4.5 Expired‑Key Backlog – Time bomb

Symptoms :

Massive keys expire simultaneously.

CPU usage spikes.

Response time increases.

Optimization – Randomize TTL to spread expirations:

import random, time

def set_key_with_random_expire(r, key, value, base_ttl=3600):
    jitter = random.randint(-int(base_ttl*0.1), int(base_ttl*0.1))
    actual_ttl = base_ttl + jitter
    r.setex(key, actual_ttl, value)
    return actual_ttl

def batch_set_with_scattered_expire(r, data_dict, base_ttl=3600):
    pipe = r.pipeline()
    for k, v in data_dict.items():
        ttl = set_key_with_random_expire(pipe, k, v, base_ttl)
    pipe.execute()

4.6 Fork Memory – Overlooked overhead

Problem scenarios :

RDB persistence.

AOF rewrite.

Full sync of master‑slave.

Recommendations :

# Disable RDB if not needed
redis-cli CONFIG SET save ""
# Enable AOF only
redis-cli CONFIG SET appendonly yes
# Reduce fsync overhead during rewrite
redis-cli CONFIG SET no-appendfsync-on-rewrite yes
# Limit rewrite frequency
redis-cli CONFIG SET auto-aof-rewrite-percentage 100
redis-cli CONFIG SET auto-aof-rewrite-min-size 64mb
# Use diskless replication to avoid extra fork
redis-cli CONFIG SET repl-diskless-sync yes

4.7 Hot Key – Local hotspot

Detection :

from collections import Counter
import redis

def find_hot_keys(r, sample_size=10000):
    """Sample commands via MONITOR and return top hot keys"""
    hot = Counter()
    monitor = r.monitor()
    count = 0
    for cmd in monitor.listen():
        if count >= sample_size:
            break
        command = cmd.get('command', '')
        if command and len(command.split()) > 1:
            key = command.split()[1]
            hot[key] += 1
        count += 1
    return hot.most_common(10)

r = redis.Redis()
print(find_hot_keys(r))

5. Memory Optimization Best Practices

5.1 Data‑structure optimization

# Wrong: three separate strings
r.set('user:1:name', 'Alice')
r.set('user:1:age', '25')
r.set('user:1:email', '[email protected]')
# Correct: a single hash
r.hset('user:1', mapping={'name':'Alice','age':'25','email':'[email protected]'})

5.2 Compression strategy

import zlib, pickle

class CompressedRedis:
    def __init__(self, client):
        self.r = client
    def set_compressed(self, key, value):
        serialized = pickle.dumps(value)
        compressed = zlib.compress(serialized)
        return self.r.set(key, compressed)
    def get_compressed(self, key):
        data = self.r.get(key)
        if data:
            return pickle.loads(zlib.decompress(data))
        return None
# Compression can reduce large JSON payloads by 60‑80%

5.3 Memory eviction policy

# Set max memory to 2 GB
redis-cli CONFIG SET maxmemory 2gb
# Choose an appropriate eviction policy
# volatile-lru, allkeys-lru, volatile-lfu, allkeys-lfu, etc.
redis-cli CONFIG SET maxmemory-policy allkeys-lfu

5.4 Monitoring & alerting system

class RedisAlertSystem:
    def __init__(self, client, webhook_url):
        self.r = client
        self.webhook = webhook_url
        self.rules = [
            {'metric':'memory_usage','threshold':0.8,'severity':'warning'},
            {'metric':'memory_usage','threshold':0.9,'severity':'critical'},
            {'metric':'fragmentation','threshold':1.5,'severity':'warning'},
            {'metric':'evicted_keys','threshold':100,'severity':'warning'}
        ]
    def check_and_alert(self):
        info = self.r.info('memory')
        stats = self.r.info('stats')
        alerts = []
        if info.get('maxmemory',0) > 0:
            usage = info['used_memory'] / info['maxmemory']
            for rule in self.rules:
                if rule['metric']=='memory_usage' and usage>rule['threshold']:
                    alerts.append({'severity':rule['severity'],'message':f"Memory usage {usage:.1%}"})
        frag = info.get('mem_fragmentation_ratio')
        if frag and frag>1.5:
            alerts.append({'severity':'warning','message':f"Fragmentation ratio {frag:.2f}"})
        evicted = stats.get('evicted_keys',0)
        if evicted>100:
            alerts.append({'severity':'warning','message':f"Evicted {evicted} keys"})
        for a in alerts:
            print(f"[{a['severity'].upper()}] {a['message']}")
        return alerts

6. Real‑World Case Studies

Case 1: E‑commerce cache avalanche

Background : During a major promotion, Redis memory jumped from 40 % to 95 % of the quota, causing massive timeouts.

Analysis :

Hot product data cached repeatedly.

Shopping‑cart keys lacked expiration.

Session data stored as plain strings.

Solution (layered cache, periodic cart cleanup, hash‑based session storage):

class LayeredCache:
    def __init__(self, client):
        self.r = client
        self.hot_threshold = 100
    def get(self, key):
        self.r.zincrby('key:access:count',1,key)
        val = self.r.get(key)
        cnt = self.r.zscore('key:access:count',key)
        if cnt and cnt > self.hot_threshold:
            self.r.expire(key,7200)  # extend hot key TTL
        return val

def clean_abandoned_carts(r):
    cursor = 0
    cleaned = 0
    while True:
        cursor, keys = r.scan(cursor, match='cart:*', count=100)
        for k in keys:
            last = r.hget(k,'last_update')
            if last and time.time() - float(last) > 86400:
                r.delete(k)
                cleaned += 1
        if cursor == 0:
            break
    return cleaned

Result : Memory usage dropped to 55 %, average response time fell from 200 ms to 50 ms.

Case 2: Game leaderboard memory optimization

Background : A sorted‑set leaderboard grew to 8 GB.

Optimization – Keep only top N entries and prune excess:

class OptimizedLeaderboard:
    def __init__(self, client, max_size=10000):
        self.r = client
        self.max = max_size
    def add_score(self, user_id, score):
        self.r.zadd('leaderboard',{user_id:score})
        if self.r.zcard('leaderboard') > self.max:
            self.r.zpopmin('leaderboard', self.r.zcard('leaderboard')-self.max)
    def get_rank(self, user_id):
        rank = self.r.zrevrank('leaderboard',user_id)
        return rank+1 if rank is not None else None
    def get_top(self, n=100):
        return self.r.zrevrange('leaderboard',0,n-1,withscores=True)
# Memory reduced from 8 GB to ~200 MB

7. Checklist for Ongoing Operations

7.1 Daily health checks

Is memory usage below 70 %?

Is fragmentation ratio below 1.5?

Any keys larger than 10 MB?

Are client connections normal?

Any new slow‑query logs?

Are there keys without expiration?

Is replication lag acceptable?

Does AOF need rewriting?

7.2 Optimization items

Correct data‑structure selection.

Potential compression.

Appropriate eviction policy.

Need for sharding.

Use pipeline to reduce network overhead.

Adjust persistence strategy.

Consider Redis version upgrade.

7.3 Emergency response flow

# Quick diagnosis
redis-cli INFO memory
redis-cli CLIENT LIST
redis-cli SLOWLOG GET 10

# Emergency stop‑gap
redis-cli FLUSHDB   # extreme case only
redis-cli CLIENT KILL TYPE normal
redis-cli CONFIG SET maxmemory 4gb

# Problem location
redis-cli --bigkeys
redis-cli MEMORY DOCTOR
redis-cli MONITOR

# Recovery steps – execute based on identified issue

8. Advanced Optimization Techniques

8.1 Lua script for bulk expiration

-- Atomically set expiration for a pattern of keys
local expire_time = ARGV[1]
local key_pattern = ARGV[2]
local cursor = '0'
local count = 0
repeat
    local result = redis.call('SCAN', cursor, 'MATCH', key_pattern, 'COUNT', 100)
    cursor = result[1]
    local keys = result[2]
    for _, key in ipairs(keys) do
        redis.call('EXPIRE', key, expire_time)
        count = count + 1
    end
until cursor == '0'
return count

8.2 Memory pre‑allocation strategy

def preallocate_memory(r, estimated_keys=1000000):
    """Pre‑allocate memory to reduce dynamic resizing overhead"""
    r.config_set('hash-max-ziplist-entries', 512)
    r.config_set('hash-max-ziplist-value', 64)
    r.config_set('list-max-ziplist-size', -2)
    r.config_set('list-compress-depth', 0)
    avg_key = 50
    avg_val = 200
    est = estimated_keys * (avg_key + avg_val)
    print(f"Estimated memory usage: {est/1024**3:.2f} GB")
    return est

8.3 Smart cache warm‑up

import asyncio, aioredis

async def smart_cache_warmup(redis_url, data_source):
    """Warm up cache in batches to avoid sudden memory spikes"""
    redis = await aioredis.create_redis_pool(redis_url)
    batch = 1000
    total = len(data_source)
    for i in range(0, total, batch):
        tasks = []
        for item in data_source[i:i+batch]:
            task = redis.setex(item['key'], 3600, item['value'])
            tasks.append(task)
        await asyncio.gather(*tasks)
        await asyncio.sleep(0.1)
        print(f"Warm‑up progress: {min(i+batch,total)}/{total}")
    redis.close()
    await redis.wait_closed()

9. Conclusion and Outlook

This article established a systematic methodology for diagnosing Redis memory issues, covered seven common problems with concrete code solutions, presented best‑practice optimizations, and demonstrated real‑world case studies that reduced memory usage dramatically.

Future trends point toward smarter automatic memory management, finer‑grained metrics, seamless horizontal scaling, and AI‑assisted diagnostics.

Action items :

Run the health‑check scripts immediately.

Deploy automated monitoring and alerting.

Build an optimization plan based on the checklist findings.

Stay updated with Redis releases and continuously refine your practices.

backendmemory managementRedisPerformance Tuningdatabases
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.