Why Redis Gets Slow and How to Diagnose & Fix Common Latency Issues
This article explains why a Redis instance may experience sudden latency spikes, walks through diagnosing the problem with slow‑log and big‑key scans, and provides concrete configuration tweaks, command‑level best practices, and operational guidelines to keep Redis performing at peak speed.
Using Complex Commands
High‑complexity commands such as SORT, SUNION, or ZUNIONSTORE can cause noticeable latency, especially when operating on large data sets. If your slow‑log shows commands with execution times above the threshold, consider simplifying or breaking the workload into smaller pieces.
Enable and Use Slow Log
Set a slow‑log threshold (e.g., 5 ms) and limit the log length to keep recent entries:
CONFIG SET slowlog-log-slower-than 5000
CONFIG SET slowlog-max-len 1000Query the latest entries with: SLOWLOG GET 5 The output lists command ID, execution time, duration (µs), and the actual command with arguments, helping you pinpoint expensive operations.
Large Keys
If slow‑log shows simple SET or DEL commands, investigate large keys. Writing or deleting a huge value requires significant memory allocation and release time.
Detect large keys using the built‑in scanner: redis-cli -h $host -p $port --bigkeys -i 0.01 The command runs an internal SCAN and reports key‑type distributions and sizes. Limit the scan frequency with -i to avoid QPS spikes.
Concentrated Expiration
Mass expiration at a fixed timestamp can overload the active expiration cycle, causing up to 25 ms latency spikes that are not recorded in the slow log.
Search your code for EXPIREAT or PEXPIREAT and randomize expiration times:
redis.expireat(key, expire_time + random(300))Instance Memory Limits
When maxmemory is reached, Redis evicts keys according to the configured policy (e.g., allkeys‑lru, volatile‑lru, allkeys‑random). Eviction adds latency, especially for large keys.
Fork Overhead (RDB/AOF)
Generating RDB snapshots or rewriting AOF forks a child process that copies the parent’s page tables. For large memory footprints this can block the main thread for seconds, dramatically increasing latency.
Monitor the fork duration with:
INFO
# look for latest_fork_usec (µs)Schedule backups on replicas during off‑peak hours and consider disabling AOF if data loss of a few seconds is acceptable.
CPU Binding
Binding Redis to a specific CPU core interferes with the forked persistence process, causing CPU contention and higher latency. Avoid CPU pinning when using RDB/AOF.
Enabling AOF
AOF offers three fsync policies: appendfsync always – safest but highest I/O cost. appendfsync everysec – balances safety (max 1 s data loss) and performance. appendfsync no – relies on OS, lowest overhead but riskier.
For most workloads everysec is recommended.
Swap Usage
If the host starts swapping, Redis latency can jump to hundreds of milliseconds because memory accesses fall back to disk. Detect swap usage, free memory, and restart the instance (preferably after a master‑slave switchover) to clear swap.
Network Overload
Excessive network traffic or saturated NICs cause packet loss and increased RTT, further degrading Redis performance. Monitor network I/O and scale bandwidth or distribute instances when needed.
Best Practices – Business Layer
Keep key names short and avoid storing huge values.
Prefer lazy‑free (Redis 4.0+) for large deletions.
Set appropriate TTLs to prevent unbounded memory growth.
Avoid O(N) commands on large collections; batch reads/writes.
Use MGET/MSET or pipelines instead of many single commands.
Never run KEYS in production; use SCAN with a low rate.
Randomize expiration times to spread load.
Choose an eviction policy that matches your workload (random often faster than LRU).
Use a connection pool and limit connections.
Prefer a single logical DB (db0) per instance; separate business lines into distinct instances.
Consider read‑write splitting and clustering for high read/write volumes.
Best Practices – Operations Layer
Isolate business lines on separate instances and machines.
Provision sufficient CPU, memory, bandwidth, and disk.
Deploy master‑slave clusters with read‑only slaves; keep master and slave on different hosts.
Run sentinel nodes (≥3) for automatic failover.
Plan capacity: limit instance memory to ~50 % of host RAM to accommodate replication buffers.
Monitor expired_keys, evicted_keys, latest_fork_usec and alert on sudden spikes.
Set a sensible slow‑log threshold (e.g., 10 ms) and monitor its length.
Adjust repl-backlog and slave client-output-buffer-limit for heavy write workloads.
Perform backups on slaves, not masters.
Prefer appendfsync everysec or disable AOF to reduce disk I/O.
When increasing maxmemory, update slaves before masters to avoid data loss.
Use persistent long‑lived connections for INFO collection to reduce connection overhead.
Throttle large‑scale scans with sleep intervals to avoid QPS spikes.
Continuously monitor and alert on resource usage to pre‑empt performance degradation.
By addressing these application‑level and operational factors, you can keep Redis responsive and avoid the latency pitfalls that commonly affect production deployments.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
