Databases 40 min read

Comprehensive Guide to Diagnosing and Optimizing Redis Performance Issues

This article provides a step‑by‑step methodology for identifying why a Redis instance becomes slow, covering baseline latency testing, slow‑log analysis, big‑key detection, expiration patterns, memory limits, fork overhead, huge‑page effects, AOF configuration, CPU binding, swap usage, memory fragmentation, network saturation, and practical remediation techniques.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Comprehensive Guide to Diagnosing and Optimizing Redis Performance Issues

Redis is a high‑performance in‑memory database, but its latency can unexpectedly increase, causing service degradation. This guide walks you through a systematic troubleshooting process to determine whether Redis is truly slow and how to resolve common causes.

1. Verify if Redis is actually slow

First, instrument your service with tracing to isolate the Redis call latency. If the network between the application and Redis is suspect, consider network quality issues, but most of the time the problem lies within Redis itself.

2. Establish baseline performance

Run an intrinsic latency test directly on the Redis server to capture the maximum and average response times over a short period: $ redis-cli -h 127.0.0.1 -p 6379 --intrinsic-latency 60 Additionally, use the latency‑history command to observe per‑second latency samples: $ redis-cli -h 127.0.0.1 -p 6379 --latency-history -i 1 Compare the observed latency against the baseline; if it exceeds the baseline by roughly two‑fold, the instance is considered slow.

3. Analyze slow‑log entries

Configure the slow‑log threshold (e.g., 5 ms) and length, then retrieve recent entries:

# Record commands slower than 5 ms
CONFIG SET slowlog-log-slower-than 5000
# Keep the latest 500 entries
CONFIG SET slowlog-max-len 500

127.0.0.1:6379> SLOWLOG get 5

Look for O(N) or higher‑complexity commands (SORT, SUNION, ZUNIONSTORE) or large O(N) commands with a huge N value, as they consume excessive CPU and can block subsequent requests.

4. Detect and handle big‑keys

Big‑keys (large values) cause slow memory allocation and release. Scan for them using the built‑in big‑key scanner: $ redis-cli -h 127.0.0.1 -p 6379 --bigkeys -i 0.01 When big‑keys are found, avoid storing oversized values, or use UNLINK (Redis 4.0+) or enable lazy‑free eviction (Redis 6.0+) to off‑load memory release to background threads.

5. Examine concentrated expirations

If many keys expire at the same timestamp (e.g., via EXPIREAT), Redis’s active expiration task runs in the main thread and can cause latency spikes, especially when a big‑key is being removed. Mitigate by adding a random offset to expiration times or enabling lazy‑free expiration.

6. Check memory‑limit eviction

When maxmemory is reached, Redis evicts keys according to the configured policy (allkeys‑lru, volatile‑lru, allkeys‑random, etc.). Eviction itself consumes CPU and can increase latency, particularly for big‑keys. Choose an eviction policy that matches your workload and consider splitting data across multiple instances.

7. Evaluate fork overhead

Background RDB/AOF rewrites and initial full synchronizations create a child process via fork(). Forking large memory maps can block the main thread for seconds. Monitor latest_fork_usec via INFO to detect long forks, keep instance size < 10 GB, and schedule heavy persistence tasks during low‑traffic windows.

8. Disable transparent huge pages

Huge pages increase the cost of copy‑on‑write during fork, inflating write latency. Verify the setting with: $ cat /sys/kernel/mm/transparent_hugepage/enabled If it shows [always], disable it: $ echo never > /sys/kernel/mm/transparent_hugepage/enabled 9. Tune AOF persistence

Three appendfsync policies exist:

always – safest but highest latency

no – best performance, risk of data loss

everysec – balanced, but still blocks when the background thread’s fsync stalls.

During AOF rewrite, enable no-appendfsync-on-rewrite yes to temporarily suspend fsync, accepting higher data‑loss risk for the rewrite duration.

10. Bind CPU cores (Redis 6.0+)

Redis can pin its main thread, I/O threads, background RDB/AOF processes to specific CPUs using server_cpulist, bio_cpulist, aof_rewrite_cpulist, and bgsave_cpulist. Bind to multiple logical cores on the same physical core to reduce cache misses, but only if you understand the trade‑offs.

11. Avoid swap usage

Swap dramatically slows Redis. Check swap consumption via:

# Find Redis PID
ps -aux | grep redis-server
# Inspect smaps for swap
cat /proc/<pid>/smaps | egrep '^(Swap|Size)'

If significant memory is swapped, add RAM or free memory, then restart the instance (preferably after a master‑slave switchover).

12. Manage memory fragmentation

Fragmentation ratio = used_memory_rss / used_memory. When > 1.5, enable automatic defragmentation (Redis 4.0+):

activedefrag yes
active-defrag-ignore-bytes 100mb
active-defrag-threshold-lower 10
active-defrag-threshold-upper 100
active-defrag-cycle-min 1
active-defrag-cycle-max 25
active-defrag-max-scan-fields 1000

Test its impact, as defragmentation runs in the main thread.

13. Monitor network bandwidth

Excessive traffic can saturate the NIC, causing packet loss and latency. Track inbound/outbound bytes and set alerts when utilization approaches the interface limit.

14. Other considerations

Prefer persistent TCP connections over short‑lived ones to avoid handshake overhead.

Ensure monitoring tools use long connections and reasonable polling intervals to avoid adding load.

Dedicate servers to Redis only; other processes competing for CPU, memory, or disk will degrade performance.

Conclusion

Redis performance issues span CPU, memory, disk, network, and OS layers. Understanding command complexities, expiration/eviction policies, persistence mechanisms, and system‑level behaviors (fork, huge pages, swap) enables you to pinpoint bottlenecks and apply targeted optimizations, ensuring low latency and high throughput.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

backendPerformanceOptimizationRedisTroubleshooting
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.