Essential Redis Health Metrics Every Engineer Should Monitor
This guide explains how to monitor critical Redis health indicators—including ping response, client connections, blocked clients, memory usage, fragmentation, cache hit rate, OPS, persistence status, expired keys, and slow logs—to ensure optimal performance and prevent failures.
Survival Check
All metrics, the most important is checking whether Redis is alive; use the PING command and expect a PONG response.
Connection Count
Number of connected clients can be obtained with src/redis-cli info Clients | grep connected_clients. This value is closely related to the connection pool configuration of services using Redis; it should not exceed 5000. A high value may indicate Redis is processing too slowly and requires investigation.
Also monitor rejected_connections, which should ideally be 0. A non‑zero value means the number of connections exceeds maxclients, requiring a review of pool settings or service load.
Blocked Clients
The blocked_clients metric, usually caused by BLPOP or BRPOP on list types, can be checked with src/redis-cli info Clients | grep blocked_clients. This value should be 0.
Memory Peak Usage
Monitor the peak memory used by Redis. The maximum allowed memory can be set with CONFIG SET maxmemory 10737418240 (recommended not to exceed 20 GB). To avoid swap and performance drops, keep used_memory_peak within a safe margin of maxmemory (e.g., if the margin is 1 GB, used_memory_peak should not exceed 9 GB).
Also ensure maxmemory is not set too low; a misconfiguration (e.g., 1 GB instead of 10 GB) can cause the server to have free memory but be unable to use it.
Memory Fragmentation Ratio
The ratio mem_fragmentation_ratio = used_memory_rss / used_memory indicates memory fragmentation. Values > 1 mean allocated memory exceeds actual usage; the larger the value, the more severe the fragmentation. Values < 1 suggest swapping due to insufficient available memory.
Redis 4.0 introduced active defragmentation (controlled by CONFIG SET activedefrag yes) to reduce fragmentation. This feature is disabled by default.
When used_memory is very small, the metric is less meaningful; it is recommended to monitor fragmentation only when used_memory is at least 1 GB.
Cache Hit Rate
Cache hit rate is calculated as keyspace_hits / (keyspace_hits + keyspace_misses). A healthy hit rate should be above 0.9 (90%). If the rate is low, investigate cache usage patterns.
OPS
The instantaneous_ops_per_sec metric shows operations per second. For stable workloads, this value should be relatively steady, but traffic patterns (e.g., low activity at night) may cause fluctuations that need to be interpreted in context.
Persistence
Check rdb_last_bgsave_status and aof_last_bgrewrite_status; both should be "ok". Monitor latest_fork_usec (fork duration in microseconds) because persistence forks a child process, which blocks the server; a large value can impact performance or cause timeouts.
Expired Keys
If Redis is used as a cache, ensure all keys have an expiration. Use src/redis-cli info Keyspace to see the number of keys and how many have an expire attribute; the counts of keys and expires should match.
# Keyspace
db0:keys=30,expires=30,avg_ttl=0
db0:keys=23,expires=22,avg_ttl=0Slow Log
Retrieve slow logs with slowlog get. Ideally the slow log is empty, but simple commands like SET key value may appear due to network latency. Do not assume every entry requires optimization; investigate the context.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
