Databases 38 min read

Comprehensive Guide to Diagnosing and Optimizing Redis Performance Issues

Redis can experience latency spikes due to factors such as complex commands, big keys, expiration patterns, memory limits, fork overhead, AOF persistence, CPU binding, swap usage, memory fragmentation, and network saturation, and this article provides a systematic troubleshooting methodology and practical optimization solutions for each scenario.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Comprehensive Guide to Diagnosing and Optimizing Redis Performance Issues

Redis is a high‑performance in‑memory database, but latency can increase unexpectedly. Before troubleshooting, verify whether Redis is truly slow by measuring baseline performance with redis-cli --intrinsic-latency or redis-cli --latency-history and comparing against a normal instance.

Common causes and solutions:

High‑complexity commands (O(N) or worse): Use SLOWLOG GET to identify slow commands, avoid heavy aggregation commands (e.g., SORT, SUNION, ZUNIONSTORE) or limit N (prefer N ≤ 300). Move aggregation to the client side.

Big keys: Scan for large keys with redis-cli --bigkeys. Reduce key size, split data structures, or use UNLINK (Redis ≥4.0) and enable lazyfree-lazy-eviction (Redis ≥6.0) to free memory asynchronously.

Concentrated expirations: Randomize expiration times (e.g., redis.expireat(key, expire_time + random(300))) or enable lazy expiration ( lazyfree-lazy-expire yes) to avoid blocking the main thread.

Memory limit (maxmemory) and eviction policies: Choose an appropriate policy (e.g., allkeys-lru, volatile-lru, or random eviction) based on workload. Remember eviction runs before command execution and can increase latency.

Fork overhead (RDB/AOF rewrite, replication): Keep instance size < 10 GB, schedule persistence during low‑traffic periods, and consider disabling AOF rewrite or using no-appendfsync-on-rewrite yes to reduce disk I/O contention.

Transparent huge pages: Disable huge pages ( echo never > /sys/kernel/mm/transparent_hugepage/enabled) because they increase memory allocation latency during copy‑on‑write.

AOF write‑fsync policies: appendfsync always is safest but slow; appendfsync no is fastest but unsafe; appendfsync everysec balances safety and performance but can still block under heavy disk I/O. Use no-appendfsync-on-rewrite yes to mitigate blocking during AOF rewrite.

CPU binding: If binding is required, bind Redis and its background threads to multiple logical cores on the same physical CPU (e.g., server_cpulist 0-7:2, bio_cpulist 1,3) to avoid CPU contention with forked processes.

Swap usage: Monitor /proc/<pid>/smaps for non‑zero Swap values. Avoid swapping by adding RAM, freeing memory, or restarting the instance after a controlled failover.

Memory fragmentation: Check mem_fragmentation_ratio via INFO. For Redis ≥4.0, enable automatic defragmentation ( activedefrag yes) with tuned thresholds, but test its impact on latency.

Network bandwidth saturation: Monitor network I/O; if a single instance consumes the full bandwidth, consider scaling out or migrating traffic.

Additional best practices include using persistent long‑lived connections for monitoring, limiting monitoring frequency, avoiding short‑lived connections, and dedicating servers to Redis to prevent resource contention from other processes.

By following this step‑by‑step methodology—establishing a baseline, examining slowlog, checking big keys, expiration patterns, memory limits, fork overhead, AOF configuration, CPU binding, swap, fragmentation, and network usage—operators can quickly pinpoint the root cause of Redis latency and apply targeted optimizations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performancedatabaseredisLatencytroubleshootingMemory
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.