Databases 27 min read

How to Diagnose and Optimize Redis Memory Usage: Real‑World Cases

This article dissects Redis's memory architecture, explains each memory component, walks through object encoding rules, buffer handling, fragmentation, and child‑process memory, then presents two production incidents with step‑by‑step analysis and concrete optimization actions to prevent future memory alarms.

Architect

Mar 15, 2024

How to Diagnose and Optimize Redis Memory Usage: Real‑World Cases

Background

Redis stores all data in memory, so the memory subsystem determines performance and stability. Operators often need to reduce memory usage, improve throughput, or diagnose memory alerts.

Redis Memory Management

The memory model consists of several components:

used_memory : total memory allocated by Redis (default jemalloc), including internal dictionaries, object memory, caches, and Lua scripts.

used_memory_rss : resident set size reported by the OS (e.g., via top or ps).

Memory fragmentation : memory that cannot be reclaimed by the allocator.

Runtime memory : memory used by the Redis process itself (typically <10 MiB).

Child‑process memory : memory used by forked processes during AOF rewrite or RDB generation.

Object Memory and Encodings

All key‑value data are stored as RedisObject structures. The type field selects an encoding that balances space and speed.

String : int (small integers), embstr (continuous memory for strings ≤44 bytes), raw (dynamic strings >44 bytes, up to 512 MiB).

List : ziplist if entry count < hash-max-ziplist-entries and each entry <64 bytes; otherwise linkedlist or quicklist (default from Redis 3.2).

Set : intset when all members are integers and count < set-max-intset-entries; otherwise hashtable.

Hash : ziplist if entries and value size meet thresholds; otherwise hashtable.

Zset : ziplist if entries < zset-max-ziplist-entries and each value <64 bytes; otherwise skiplist.

Buffer Memory

Client output buffer : controlled by client-output-buffer-limit for normal, slave, and pubsub clients. Exceeding the limit closes the connection.

Replication backlog : circular buffer (default 1 MiB) that stores data awaiting replication; size determines how long a slave can be disconnected without data loss.

AOF buffer : holds commands before they are flushed to disk according to appendfsync policy (always, everysec, no). The buffer lives only seconds and usually consumes little memory.

Memory Fragmentation

Jemalloc allocates memory in fixed size classes (8 B, 16 B, …, 4 KiB, 8 KiB, …). Requests that do not match a class are rounded up, creating internal fragmentation. Freed memory is not returned to the OS immediately, causing external fragmentation. The metric mem_fragmentation_ratio = used_memory_rss / used_memory is reported by INFO; values close to 1 indicate low fragmentation, while >1.5 signals serious waste.

Child‑Process Memory

Forked processes for AOF rewrite or RDB generation share memory with the parent via copy‑on‑write (COW). Only pages written by the child are duplicated. Enabling Transparent Huge Pages (THP) expands page size from 4 KiB to 2 MiB, increasing the amount of memory copied during writes and potentially causing spikes.

Memory Optimization Strategies

Object Memory Optimization

Keep string keys ≤44 bytes to stay in embstr (avoids an extra allocation).

Group many small strings into a hash so the hash can use ziplist compression.

Avoid large numbers of elements in lists, sets, or hashes that force a switch to linkedlist or hashtable, which consume more memory.

Do not use ziplist for frequently updated data; each modification may trigger realloc and memcpy, degrading performance.

Monitor the load factor loader_factor = dict->used / dict->size; when it exceeds the configured threshold Redis performs incremental rehashing.

Identify “bigkey” objects (e.g., >10 MiB) and redesign the schema.

Client Buffer Optimization

Typical causes of abnormal buffer growth include large keys, the MONITOR command, massive pipelining, and slow slave replication. Mitigation steps:

Limit key size and avoid large keys.

Set client-output-buffer-limit to 5‑15 % of instance memory (never exceed 20 %). Example:

CONFIG SET client-output-buffer-limit normal 4096mb 2048mb 120

Avoid MONITOR in production or rename the command.

Restrict the number of commands per pipeline, especially those returning large result sets.

Size the replication backlog according to write rate and network bandwidth:

backlog = (write_rate * cmd_size - replication_rate * cmd_size) * 2

Fragmentation Optimization

Redis <4.0 requires a restart to reclaim fragmented memory. From Redis 4.0 onward, use built‑in defragmentation:

Manual: MEMORY PURGE Automatic: enable with activedefrag yes and tune thresholds:

activedefrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 25
active-defrag-cycle-max 75

Child‑Process Memory Optimization

Keep write intensity low during AOF rewrite or RDB generation, and disable THP to avoid large COW copies.

Case Study 1 – Client Buffer Overflow

A production Redis cluster showed memory usage climbing to 100 % after a scaling operation. Only a few instances exhibited used_memory growth while key count remained stable.

Inspection of INFO revealed that client_output_buffer grew in lockstep with used_memory. CLIENT LIST identified a client with omem=5259227504 executing GET. The key size (≤512 MiB) could not alone explain the buffer size, indicating the client cached many keys.

Temporary mitigation:

Kill the offending connection.

Set a stricter output‑buffer limit (2‑4 GiB) using the command shown above.

The client was a C++ brpc Redis client that pipelines commands. Repository reference:

https://github.com/apache/incubator-brpc/blob/master/docs/cn/redis_client.md

Conclusion: Pipelining improves throughput but must be bounded; otherwise the output buffer can exhaust memory and cause connection termination.

Case Study 2 – Replica Memory Growth

In a 190‑node cluster, three replica nodes exceeded 95 % memory usage while masters remained stable.

Master used_memory stayed constant; replicas’ used_memory grew gradually.

Ops metrics were low, indicating no traffic surge.

Master memory limit was 6 GiB, replica limit 5 GiB, creating a mismatch.

Application performed frequent APPEND on large string keys. During APPEND, Redis doubles the allocation when the current SDS buffer is insufficient.

Verification with MEMORY USAGE on a representative large key showed replica keys consuming up to twice the memory of the master. The root cause was the replica reallocating larger SDS buffers during APPEND, while the master already held the larger allocation.

Remediation:

Expand replica memory capacity and keep usage below 70 %.

Avoid excessive APPEND on large strings; consider using a list or stream instead.

Note: used_memory can exceed used_memory_rss because jemalloc counts virtual allocations, while RSS reflects only pages that have been touched. This behavior is confirmed by Redis developers (

https://github.com/redis/redis/issues/946#issuecomment-13599772

Overall Summary

Understanding Redis’s internal memory layout enables rapid diagnosis of abnormal memory usage. By correlating INFO metrics, CLIENT LIST data, and object‑encoding choices, engineers can pinpoint the exact cause—oversized client buffers, inefficient encodings, fragmentation, or replica‑specific allocation patterns—and apply targeted configuration changes or schema redesigns to keep the cluster stable.

References

“Redis Design and Implementation” (book)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Case Study Performance Optimization Memory Management database redis

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.