Databases 25 min read

Why Redis Gets Slow and How to Diagnose & Fix Common Latency Issues

This article explains why a Redis instance may experience sudden latency spikes, walks through diagnosing the problem with slow‑log and big‑key scans, and provides concrete configuration tweaks, command‑level best practices, and operational guidelines to keep Redis performing at peak speed.

dbaplus Community
dbaplus Community
dbaplus Community
Why Redis Gets Slow and How to Diagnose & Fix Common Latency Issues

Using Complex Commands

High‑complexity commands such as SORT, SUNION, or ZUNIONSTORE can cause noticeable latency, especially when operating on large data sets. If your slow‑log shows commands with execution times above the threshold, consider simplifying or breaking the workload into smaller pieces.

Enable and Use Slow Log

Set a slow‑log threshold (e.g., 5 ms) and limit the log length to keep recent entries:

CONFIG SET slowlog-log-slower-than 5000
CONFIG SET slowlog-max-len 1000

Query the latest entries with: SLOWLOG GET 5 The output lists command ID, execution time, duration (µs), and the actual command with arguments, helping you pinpoint expensive operations.

Large Keys

If slow‑log shows simple SET or DEL commands, investigate large keys. Writing or deleting a huge value requires significant memory allocation and release time.

Detect large keys using the built‑in scanner: redis-cli -h $host -p $port --bigkeys -i 0.01 The command runs an internal SCAN and reports key‑type distributions and sizes. Limit the scan frequency with -i to avoid QPS spikes.

Concentrated Expiration

Mass expiration at a fixed timestamp can overload the active expiration cycle, causing up to 25 ms latency spikes that are not recorded in the slow log.

Search your code for EXPIREAT or PEXPIREAT and randomize expiration times:

redis.expireat(key, expire_time + random(300))

Instance Memory Limits

When maxmemory is reached, Redis evicts keys according to the configured policy (e.g., allkeys‑lru, volatile‑lru, allkeys‑random). Eviction adds latency, especially for large keys.

Fork Overhead (RDB/AOF)

Generating RDB snapshots or rewriting AOF forks a child process that copies the parent’s page tables. For large memory footprints this can block the main thread for seconds, dramatically increasing latency.

Monitor the fork duration with:

INFO
# look for latest_fork_usec (µs)

Schedule backups on replicas during off‑peak hours and consider disabling AOF if data loss of a few seconds is acceptable.

CPU Binding

Binding Redis to a specific CPU core interferes with the forked persistence process, causing CPU contention and higher latency. Avoid CPU pinning when using RDB/AOF.

Enabling AOF

AOF offers three fsync policies: appendfsync always – safest but highest I/O cost. appendfsync everysec – balances safety (max 1 s data loss) and performance. appendfsync no – relies on OS, lowest overhead but riskier.

For most workloads everysec is recommended.

Swap Usage

If the host starts swapping, Redis latency can jump to hundreds of milliseconds because memory accesses fall back to disk. Detect swap usage, free memory, and restart the instance (preferably after a master‑slave switchover) to clear swap.

Network Overload

Excessive network traffic or saturated NICs cause packet loss and increased RTT, further degrading Redis performance. Monitor network I/O and scale bandwidth or distribute instances when needed.

Best Practices – Business Layer

Keep key names short and avoid storing huge values.

Prefer lazy‑free (Redis 4.0+) for large deletions.

Set appropriate TTLs to prevent unbounded memory growth.

Avoid O(N) commands on large collections; batch reads/writes.

Use MGET/MSET or pipelines instead of many single commands.

Never run KEYS in production; use SCAN with a low rate.

Randomize expiration times to spread load.

Choose an eviction policy that matches your workload (random often faster than LRU).

Use a connection pool and limit connections.

Prefer a single logical DB (db0) per instance; separate business lines into distinct instances.

Consider read‑write splitting and clustering for high read/write volumes.

Best Practices – Operations Layer

Isolate business lines on separate instances and machines.

Provision sufficient CPU, memory, bandwidth, and disk.

Deploy master‑slave clusters with read‑only slaves; keep master and slave on different hosts.

Run sentinel nodes (≥3) for automatic failover.

Plan capacity: limit instance memory to ~50 % of host RAM to accommodate replication buffers.

Monitor expired_keys, evicted_keys, latest_fork_usec and alert on sudden spikes.

Set a sensible slow‑log threshold (e.g., 10 ms) and monitor its length.

Adjust repl-backlog and slave client-output-buffer-limit for heavy write workloads.

Perform backups on slaves, not masters.

Prefer appendfsync everysec or disable AOF to reduce disk I/O.

When increasing maxmemory, update slaves before masters to avoid data loss.

Use persistent long‑lived connections for INFO collection to reduce connection overhead.

Throttle large‑scale scans with sleep intervals to avoid QPS spikes.

Continuously monitor and alert on resource usage to pre‑empt performance degradation.

By addressing these application‑level and operational factors, you can keep Redis responsive and avoid the latency pitfalls that commonly affect production deployments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

databaseredisLatency
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.