Detecting and Resolving Redis Performance Bottlenecks
This guide explains how to identify when Redis is slow, measure baseline latency, monitor slow commands and latency, troubleshoot network, fork, huge pages, swap, AOF, expiration, and big keys, and provides a practical checklist of solutions.
Introduction
When Redis latency spikes, client requests may time out (e.g., "Could not get a resource from the pool"), causing order failures, MySQL overload, and database crashes. Detecting and fixing Redis performance issues promptly is essential.
1. Baseline Latency Measurement
Baseline latency is the round‑trip time from client request to response under low load. Measure it with: redis-cli --latency -h HOST -p PORT or redis-cli --intrinsic-latency 100 Run the test for at least 100 seconds to capture spikes. The maximum observed latency (e.g., 3079 µs ≈ 3 ms) becomes the baseline. Consider Redis slow when current latency exceeds twice this value.
2. Slow Command Monitoring
Enable the slowlog to capture commands exceeding a configurable threshold (default 10 ms). Set the threshold based on the baseline, e.g.:
redis-cli CONFIG SET slowlog-log-slower-than 6000to log commands slower than 6 ms. Retrieve entries with: redis-cli slowlog get 2 Each entry contains: ID, Unix timestamp, execution time (µs), and the command with arguments.
3. Latency Monitoring
Redis 2.8.13 introduced latency‑monitor. Configure a threshold in milliseconds (e.g., three times the baseline, 9 ms):
redis-cli CONFIG SET latency-monitor-threshold 9View recent events with: redis-cli latency latest Events show name, timestamp, latency, and max latency.
4. Network Communication Delay
Network round‑trip time (RTT) adds latency. A 1 Gbit/s network typically has ~200 µs RTT. Commands that require multiple RTTs (e.g., many HGETALL calls) can be optimized with pipelining to reduce round trips.
5. Fork‑Generated RDB Snapshots
Creating RDB snapshots requires forking the process, which blocks the main thread and uses copy‑on‑write (COW). Large instances allocate significant page tables (e.g., a 24 GB instance needs ~48 MB). During bgsave, memory copying can cause noticeable latency, and the master cannot serve writes while replicas load the RDB.
6. Transparent Huge Pages (THP)
Linux THP allocates 2 MB pages. When Redis modifies a small amount of data during RDB generation, the entire 2 MB page is copied, increasing latency. Disable THP with:
echo never > /sys/kernel/mm/transparent_hugepage/enabled7. Swap (OS Paging)
If physical memory is insufficient, the kernel swaps out pages. Identify the Redis process ID: redis-cli info | grep process_id Then inspect /proc/PROCESS_ID/smaps for Size and Swap fields. Non‑zero swap indicates memory pressure that can degrade performance; large swap usage (hundreds of MB or GB) is a red flag.
8. AOF and Disk I/O
Redis persistence can be tuned via the appendfsync setting:
no – no fsync (fastest, risk of data loss)
everysec – fsync every second (default, acceptable for cache workloads)
always – fsync on every write (slow, high durability)
For cache use cases, set appendfsync to no or everysec. Reduce disk contention during AOF rewrite with:
redis-cli CONFIG SET no-appendfsync-on-rewrite yes9. Expire Deletion
Redis evicts expired keys lazily (on access) or actively (every 100 ms). Active expiration samples a set number of keys; if >25 % are expired, a full scan runs, which can block the server.
10. Big Key Issues
Big keys (large strings, long lists, massive hashes, or ZSETs) can cause OOM, replication imbalance, bandwidth saturation, and blocking deletions. Detect them with tools like redis-rdb-tools. Mitigate by:
Splitting large hashes or lists into multiple smaller keys.
Using UNLINK for non‑blocking deletion.
Adding random jitter to expiration times to avoid mass expirations.
Checklist
Measure current Redis baseline latency.
Enable slowlog and latency‑monitor to locate slow commands.
Use SCAN (or SSCAN, HSCAN, ZSCAN) instead of blocking commands.
Keep instance size between 2‑4 GB to avoid long RDB loads.
Disable transparent huge pages.
Monitor swap usage and increase physical memory if needed.
Adjust AOF settings ( no-appendfsync-on-rewrite) to reduce disk I/O.
Handle big keys by splitting or using UNLINK.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
