Databases 24 min read

Redis Performance Degradation: Common Latency Issues, Diagnosis, and Optimization

This article explains why Redis can become slow, covering typical latency causes such as high‑complexity commands, large keys, concentrated expirations, memory limits, fork overhead, CPU binding, AOF settings, swap usage, and network saturation, and provides practical troubleshooting steps and best‑practice recommendations.

Top Architect
Top Architect
Top Architect
Redis Performance Degradation: Common Latency Issues, Diagnosis, and Optimization

Redis Performance Degradation: Common Latency Issues and Analysis

Redis, as an in‑memory database, can handle up to 100k QPS per instance, but latency spikes often occur due to misuse or improper operations. This guide analyzes typical delay sources and how to locate them.

High‑Complexity Commands

When latency suddenly increases, first check the slowlog. Set the slowlog threshold (e.g., 5 ms) and retain the latest 1000 entries:

# Record commands slower than 5 ms
CONFIG SET slowlog-log-slower-than 5000

# Keep only the latest 1000 entries
CONFIG SET slowlog-max-len 1000

After configuration, use SLOWLOG GET 5 to view recent slow commands. Commands with O(n) complexity (e.g., SORT, SUNION, ZUNIONSTORE) on large data sets can cause noticeable delays.

Large Keys

If slowlog shows many SET or DEL entries, investigate whether large keys are being written. Large keys require more memory allocation and release time.

Scan for big keys: redis-cli -h $host -p $port --bigkeys -i 0.01 The command iterates all keys with SCAN and reports size statistics per type. Use the -i interval to limit QPS impact.

Concentrated Expirations

Mass expirations at fixed times can increase latency because Redis performs active expiration in the main thread. Search code for EXPIREAT or PEXPIREAT and randomize expiration times:

# Randomize expiration within 5 minutes after the intended time
redis.expireat(key, expire_time + random(300))

Monitor expired_keys via INFO and alert on sudden spikes.

Memory Limit Reached

When maxmemory is hit, Redis evicts keys according to the configured policy (e.g., allkeys-lru, volatile-lru, allkeys-random, etc.). Eviction of large keys adds extra latency; consider splitting data across multiple instances.

Fork Overhead

Background RDB/AOF rewriting forks a child process. Large memory footprints make the fork operation costly, blocking the main thread. Check latest_fork_usec in INFO to see recent fork duration. Schedule backups on replicas and avoid enabling AOF if data loss tolerance permits.

CPU Binding

Binding Redis to a specific CPU can cause contention during forked persistence tasks, worsening latency. Avoid CPU affinity when using RDB/AOF rewriting.

AOF Configuration

AOF provides three fsync policies: always, everysec, and no. everysec offers a good balance between durability and performance; always incurs high disk I/O and should be avoided for high‑throughput workloads.

Swap Usage

When the system starts swapping, Redis performance collapses. Monitor memory and swap usage, and restart instances after clearing swap, preferably via a master‑slave switchover.

Network Saturation

High network load leads to increased RTT and packet loss, directly affecting Redis latency. Identify the instance consuming excessive bandwidth and scale out or increase network capacity.

Best Practices (Application and Operations)

Application side: use short keys, avoid large values, enable lazy‑free for big deletions, set appropriate expirations, avoid O(n) commands, batch reads/writes with MGET/MSET or pipelines, replace KEYS with SCAN, randomize expirations, choose suitable eviction policies, use connection pools, limit to DB0, consider read‑write splitting and clustering for high load.

Operations side: isolate business lines per instance, provision sufficient CPU/memory/bandwidth/disk, deploy master‑slave clusters with read‑only slaves, use Sentinel for HA, keep instance memory ≤ ½ of host memory, monitor expired_keys, evicted_keys, latest_fork_usec, set sensible slowlog thresholds (≈10 ms), tune replication buffers, perform backups on slaves, avoid AOF or use everysec, and ensure long‑lived connections for monitoring to reduce overhead.

By addressing both usage patterns and operational settings, Redis can maintain its high‑performance characteristics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringredisLatencytroubleshootingbest-practices
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.