Databases 19 min read

Why Is My Redis Slowing Down in Production and How to Diagnose It

The article explains how to detect and troubleshoot Redis latency spikes by measuring baseline performance, using slowlog and latency‑monitor, checking network RTT, fork‑induced pauses, huge pages, swap usage, AOF settings, expiration handling, and big‑key problems, providing concrete commands and mitigation steps.

Programmer XiaoFu
Programmer XiaoFu
Programmer XiaoFu
Why Is My Redis Slowing Down in Production and How to Diagnose It

Baseline latency measurement

Run redis-cli --intrinsic-latency 100 on the Redis server to avoid network influence. The command reports the maximum latency observed during the test in microseconds. Example output shows a peak of 3079 µs (≈3 ms), which is used as the baseline. If runtime latency exceeds twice this baseline, the instance is considered slow.

Slow command monitoring

Two built‑in tools locate slow commands:

Slowlog records commands whose execution time exceeds a configurable threshold (default 10 ms). Set the threshold to twice the baseline, e.g. redis-cli CONFIG SET slowlog-log-slower-than 6000 (6 ms). SLOWLOG GET returns entries with four fields: sequence number, Unix timestamp, execution time (µs), and the command with arguments.

Latency monitoring (available since Redis 2.8.13) records events whose latency exceeds a millisecond threshold. Configure with CONFIG SET latency-monitor-threshold 9 for a 9 ms threshold (three times a 3 ms baseline). latency latest shows event name, timestamp, latency, and max latency.

Network‑induced latency

Each command follows the RTT flow: send → queue → execute → reply. Commands that cannot be pipelined (e.g., HGETALL) incur multiple RTTs. Using pipeline reduces round‑trips.

Redis pipeline illustration
Redis pipeline illustration

Slow commands

Commands with O(N) complexity (e.g., HGETALL, SMEMBERS, SORT, LREM, SUNION) should be avoided or moved to slaves. Prefer O(1) or O(log N) commands and use incremental iteration commands SCAN, SSCAN, HSCAN, ZSCAN. Disable the KEYS command in production.

Fork‑generated RDB snapshots

Creating an RDB snapshot forks a background process, blocking the single‑threaded main loop. The copy‑on‑write (COW) mechanism duplicates memory pages; a 24 GB instance may allocate ~48 MB of page tables, and a BGSAVE can copy 48 MB of memory. Large snapshots also prevent reads on replicas, so keep instance size between 2 GB and 4 GB.

Transparent Huge Pages

Linux transparent huge pages allocate 2 MB pages. During RDB generation, even a 50 B write forces a 2 MB copy, causing noticeable latency under heavy write load. Disable with echo never > /sys/kernel/mm/transparent_hugepage/enabled.

Swap‑induced latency

When physical memory is insufficient, Redis pages may be swapped out. Inspect

/proc/
pid
/smaps

for Size and Swap lines; a non‑zero swap size (especially hundreds of MB) indicates memory pressure. Mitigations: add RAM, isolate Redis on a dedicated machine, or increase cluster node count.

AOF and disk I/O

Redis offers three appendfsync policies: no: no fsync; only the write system call is performed. everysec: fsync every second (asynchronous background thread). always: fsync on every write (lowest latency, highest disk I/O).

For cache workloads, no or everysec is recommended. Reduce AOF size and set no-appendfsync-on-rewrite yes to avoid fsync during AOF rewrite.

Expiration eviction

Redis lazily deletes expired keys on access and also runs a periodic eviction every 100 ms. The algorithm samples ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP (default 20) keys; if more than 25 % are expired, a full scan runs. Massive simultaneous expirations can block the server. Randomizing the expire timestamp mitigates the burst.

Big‑key problems

Keys with large values or many members (e.g., 5 MB strings, lists of 10 000 items, hashes with 10 MB total value) cause OOM, memory imbalance in clusters, high bandwidth usage, and long‑running deletions. Identify them with redis‑rdb‑tools. Solutions: split the key into smaller keys, or delete asynchronously with UNLINK (available since Redis 4.0).

Checklist for resolving Redis slowdowns

Measure the current baseline latency with redis-cli --intrinsic-latency.

Enable slow‑command monitoring ( SLOWLOG and latency‑monitor).

Identify slow commands and replace them with SCAN -style iteration.

Keep instance data size between 2 GB and 4 GB to avoid long RDB loads.

Disable transparent huge pages.

Check for excessive swap usage via

/proc/
pid
/smaps

and address memory pressure.

Adjust AOF settings (use no or everysec, enable no-appendfsync-on-rewrite).

Detect and split big keys; use UNLINK for non‑blocking deletion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

redislatencyaofrdbpipelinebigkeyswapslowlog
Programmer XiaoFu
Written by

Programmer XiaoFu

xiaofucode.com – a programmer learning guide driven by the pursuit of profit

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.