Comprehensive Guide to Diagnosing and Optimizing Redis Performance Issues
Redis can experience latency spikes due to factors such as complex commands, big keys, expiration patterns, memory limits, fork overhead, AOF persistence, CPU binding, swap usage, memory fragmentation, and network saturation, and this article provides a systematic troubleshooting methodology and practical optimization solutions for each scenario.
Redis is a high‑performance in‑memory database, but latency can increase unexpectedly. Before troubleshooting, verify whether Redis is truly slow by measuring baseline performance with redis-cli --intrinsic-latency or redis-cli --latency-history and comparing against a normal instance.
Common causes and solutions:
High‑complexity commands (O(N) or worse): Use SLOWLOG GET to identify slow commands, avoid heavy aggregation commands (e.g., SORT, SUNION, ZUNIONSTORE) or limit N (prefer N ≤ 300). Move aggregation to the client side.
Big keys: Scan for large keys with redis-cli --bigkeys. Reduce key size, split data structures, or use UNLINK (Redis ≥4.0) and enable lazyfree-lazy-eviction (Redis ≥6.0) to free memory asynchronously.
Concentrated expirations: Randomize expiration times (e.g., redis.expireat(key, expire_time + random(300))) or enable lazy expiration ( lazyfree-lazy-expire yes) to avoid blocking the main thread.
Memory limit (maxmemory) and eviction policies: Choose an appropriate policy (e.g., allkeys-lru, volatile-lru, or random eviction) based on workload. Remember eviction runs before command execution and can increase latency.
Fork overhead (RDB/AOF rewrite, replication): Keep instance size < 10 GB, schedule persistence during low‑traffic periods, and consider disabling AOF rewrite or using no-appendfsync-on-rewrite yes to reduce disk I/O contention.
Transparent huge pages: Disable huge pages ( echo never > /sys/kernel/mm/transparent_hugepage/enabled) because they increase memory allocation latency during copy‑on‑write.
AOF write‑fsync policies: appendfsync always is safest but slow; appendfsync no is fastest but unsafe; appendfsync everysec balances safety and performance but can still block under heavy disk I/O. Use no-appendfsync-on-rewrite yes to mitigate blocking during AOF rewrite.
CPU binding: If binding is required, bind Redis and its background threads to multiple logical cores on the same physical CPU (e.g., server_cpulist 0-7:2, bio_cpulist 1,3) to avoid CPU contention with forked processes.
Swap usage: Monitor /proc/<pid>/smaps for non‑zero Swap values. Avoid swapping by adding RAM, freeing memory, or restarting the instance after a controlled failover.
Memory fragmentation: Check mem_fragmentation_ratio via INFO. For Redis ≥4.0, enable automatic defragmentation ( activedefrag yes) with tuned thresholds, but test its impact on latency.
Network bandwidth saturation: Monitor network I/O; if a single instance consumes the full bandwidth, consider scaling out or migrating traffic.
Additional best practices include using persistent long‑lived connections for monitoring, limiting monitoring frequency, avoiding short‑lived connections, and dedicating servers to Redis to prevent resource contention from other processes.
By following this step‑by‑step methodology—establishing a baseline, examining slowlog, checking big keys, expiration patterns, memory limits, fork overhead, AOF configuration, CPU binding, swap, fragmentation, and network usage—operators can quickly pinpoint the root cause of Redis latency and apply targeted optimizations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
