Why Is My Redis Slowing Down? A Complete Troubleshooting Guide
This article provides a systematic, step‑by‑step methodology for diagnosing Redis latency spikes, covering baseline performance testing, slow‑log analysis, high‑complexity commands, big‑key handling, expiration patterns, memory limits, fork overhead, huge‑page settings, AOF configurations, CPU binding, swap usage, memory fragmentation, network saturation, and practical monitoring tips.
Verify Whether Redis Is Slow
First isolate the latency increase to the Redis service. Use distributed tracing in the business service to measure each hop. Then benchmark the Redis instance directly on its host to obtain a baseline latency.
$ redis-cli -h 127.0.0.1 -p 6379 --intrinsic-latency 60This command reports the maximum latency observed during the 60‑second window (e.g., 72 µs). For a continuous view you can run:
$ redis-cli -h 127.0.0.1 -p 6379 --latency-history -i 1Typical average latency on a healthy instance is 0.08 ms–0.13 ms. Consider the instance slow if its latency is more than twice the baseline measured on a comparable server.
Analyze Slow‑Log
Enable the slow‑log with a threshold that matches your latency tolerance (e.g., 5 ms) and keep enough entries for analysis:
# Record commands slower than 5 ms
CONFIG SET slowlog-log-slower-than 5000
# Keep the latest 500 entries
CONFIG SET slowlog-max-len 500Query recent entries: 127.0.0.1:6379> SLOWLOG GET 5 Inspect the command name, arguments, and execution time (in microseconds) to spot expensive operations.
High‑Complexity Commands
Commands whose algorithmic complexity is O(N) or higher (e.g., SORT , SUNION , ZUNIONSTORE ) or commands with a very large N can consume excessive CPU and increase network payload.
CPU usage rises because the server must process more elements.
Large result sets increase protocol overhead.
Mitigation:
Avoid such commands in hot paths; move aggregation to the client.
If they must be used, keep N small (recommended N ≤ 300) and request only the needed fields.
Big‑Key Problem
A key whose value is very large (a “bigkey”) incurs high allocation cost on write and high deallocation cost on delete.
Detect bigkeys with the built‑in scanner:
$ redis-cli -h 127.0.0.1 -p 6379 --bigkeys -i 0.01The output lists the largest key per data type and overall memory distribution.
Remediation:
Split oversized values into multiple smaller keys.
For Redis ≥ 4.0 use UNLINK instead of DEL to free memory asynchronously.
For Redis ≥ 6.0 enable lazyfree-lazy-user-del yes to perform deletion in background threads.
Concentrated Expiration
When many keys expire at the same timestamp (e.g., via EXPIREAT ), Redis’s active expiration task runs on the main thread and can block client requests, causing latency spikes that do not appear in the slow‑log.
Mitigation:
Randomize expiration times so that deletions are spread over time.
For Redis ≥ 4.0 enable lazy‑free expiration with lazyfree-lazy-expire yes.
Memory Limit and Eviction Policies
If maxmemory is reached, Redis evicts keys before accepting new writes, adding latency. Common policies:
allkeys-lru – evict the least‑recently‑used key regardless of expiration.
volatile-lru – evict the least‑recently‑used key that has an expiration.
allkeys-random , volatile-random , allkeys-ttl , noeviction , allkeys-lfu , volatile-lfu .
Select a policy that matches the workload; allkeys-lru and volatile-lru are the most frequently used.
Fork Overhead (RDB / AOF Rewrite)
Background persistence creates a child process via fork(). The parent must copy its page tables, which can take seconds for large instances and block the main thread.
Check the duration of the last fork: # latest_fork_usec:59477 # microseconds If the value is large, consider reducing instance size, scheduling persistence during off‑peak hours, or disabling AOF rewrite on replicas.
Transparent Huge Pages
Linux transparent huge pages allocate memory in 2 MiB chunks. While they reduce the number of allocations, each allocation takes longer, increasing write latency.
Check the setting:
$ cat /sys/kernel/mm/transparent_hugepage/enabledIf the output shows [always], disable it:
$ echo never > /sys/kernel/mm/transparent_hugepage/enabledAOF Persistence Impact
Redis supports three appendfsync policies:
always – every write is flushed to disk; highest durability but highest latency.
no – writes stay in OS buffers; lowest latency but risk of data loss.
everysec – a background thread fsyncs once per second; a balance.
During an AOF rewrite, you can temporarily disable fsync to avoid blocking the main thread:
# Disable AOF fsync during rewrite
no-appendfsync-on-rewrite yesBe aware that this increases the risk of data loss if a crash occurs during the rewrite.
CPU Binding
Binding the Redis process to a single logical core can cause the forked persistence processes to compete for the same CPU, worsening latency.
For Redis 6.0+ you can bind different thread groups to separate core sets:
# Server and I/O threads on cores 0,2,4,6
server_cpulist 0-7:2
# Background I/O threads on cores 1,3
bio_cpulist 1,3
# AOF rewrite on cores 8‑11
aof_rewrite_cpulist 8-11
# RDB save on cores 1,10,11
bgsave_cpulist 1,10-11Apply CPU binding only after understanding the server’s architecture and workload.
Swap Usage
If Redis starts swapping, latency can increase to hundreds of milliseconds because memory pages are read from disk.
Check swap usage for the Redis process:
# Find Redis PID
ps -aux | grep redis-server
# Inspect swap per memory region (replace $pid with the actual PID)
cat /proc/$pid/smaps | egrep '^(Swap|Size)'If a significant amount of memory is swapped, add RAM or free memory and restart the instance (preferably after a master‑slave switchover).
Memory Fragmentation
Fragmentation ratio = used_memory_rss / used_memory. A ratio > 1.5 indicates > 50 % fragmentation.
Mitigation:
Redis < 4.0 – restart the instance.
Redis ≥ 4.0 – enable active defragmentation:
activedefrag yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 1
active-defrag-cycle-max 25
active-defrag-max-scan-fields 1000Test the impact before enabling in production because defragmentation consumes CPU.
Network Bandwidth Saturation
When a Redis instance consumes the entire network bandwidth of its host, TCP retransmissions and packet loss increase latency. Monitor network traffic and scale out or migrate heavy instances.
Practical Operational Tips
Prefer long‑lived connections; avoid frequent short connections that add TCP handshake overhead.
Collect comprehensive metrics via INFO (e.g., expired_keys, latency, CPU, memory, network) and set alerts.
Ensure monitoring tools themselves use persistent connections and reasonable polling intervals.
Run Redis on dedicated servers; avoid co‑locating unrelated workloads that compete for CPU, memory, or disk.
Conclusion
Redis latency is affected by command complexity, key size, expiration patterns, memory limits, persistence mechanisms, OS‑level features (transparent huge pages, swap, NUMA), CPU binding, and network conditions. By following the systematic checks above—baseline latency testing, slow‑log analysis, big‑key detection, expiration randomization, appropriate eviction policy, fork‑time monitoring, disabling huge pages, tuning AOF fsync, careful CPU binding, avoiding swap, and managing fragmentation and bandwidth—you can quickly identify the root cause of latency spikes and apply targeted optimizations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
