Databases 24 min read

Redis Latency Mystery: How to Diagnose and Fix Slow Redis Calls

When a Redis‑backed service suddenly experiences high response times, this guide walks through systematic troubleshooting steps—checking server resources, network latency, Redis metrics, slow‑log, memory fragmentation, big keys, and hot keys—culminating in practical solutions such as pipelining, read/write splitting, and cache layering.

ITPUB

Sep 28, 2023

Redis Latency Mystery: How to Diagnose and Fix Slow Redis Calls

During a routine shift, an alert indicated that a critical module’s Redis calls were taking far longer than usual. The author shares a complete, step‑by‑step debugging workflow that can be applied to any Redis‑based service.

1. General Service Troubleshooting

Start by inspecting the basic health of the host running the service:

Check memory usage and CPU load.

Verify node‑level load (CPU, memory, network I/O, disk I/O).

Confirm that the service’s own metrics (e.g., instantaneous_ops_per_sec) are within expected ranges.

If these indicators look normal, move on to the data volume. A sudden increase (e.g., 5×) often points to a downstream bottleneck.

2. Redis‑Specific Diagnosis

Redis can be slow for several reasons. The following checklist helps isolate the root cause.

2.1 Network Latency

Measure round‑trip latency between the client and the Redis node. Typical values are ~200 µs for 1 Gbps TCP and ~30 µs for Unix‑domain sockets. High latency may stem from poor network quality, packet loss, or virtual‑machine overhead.

2.2 Redis Server Health

Use INFO commands to inspect CPU, memory, and I/O:

# Total commands processed since start
total_commands_processed:2255
# Real‑time OPS
instantaneous_ops_per_sec:12
# Network I/O
total_net_input_bytes:34312
total_net_output_bytes:78215
instantaneous_input_kbps:1.20
instantaneous_output_kbps:2.62

CPU usage near 90 % often signals a hot‑key or a blocking operation, while memory and disk I/O should remain stable.

2.3 Slow‑Log and Latency‑Histogram

Check the slow‑log to see which commands exceed the threshold (default 1 s): redis-cli -h 127.0.0.1 -p 6379 SLOWLOG GET Also run a latency histogram to observe min/avg/max values:

redis-cli -h 127.0.0.1 -p 6379 --latency-history -i 1

2.4 Key‑Space and Big‑Key Analysis

Inspect the total number of keys (should be < 10 k for a single instance) and look for oversized keys: redis-cli -h 127.0.0.1 -p 6379 INFO keyspace Detect big keys with the built‑in scanner:

redis-cli -h 127.0.0.1 -p 6379 --bigkeys -i 0.01

If no big keys are found, the issue likely lies elsewhere.

2.5 Hot‑Key Detection (Redis ≥ 5.0)

Enable an LFU eviction policy and query hot keys:

CONFIG SET maxmemory-policy allkeys-lfu
redis-cli -h 127.0.0.1 -p 6379 HOTKEYS

In the case study, a hot key caused CPU to spike while OPS remained modest, confirming that the hot‑key was the bottleneck.

2.6 Memory Fragmentation & Eviction

Check fragmentation ratio ( used_memory_rss / used_memory). Values > 1.5 indicate severe fragmentation and possible swapping.

# Example output
used_memory_rss_human:1.2G
used_memory_peak_human:1.5G
mem_fragmentation_ratio:1.23

2.7 Replication & Persistence (Multi‑Instance)

For clustered setups, monitor master_link_status, master_last_io_seconds_ago, and eviction counters ( evicted_keys).

3. Reproducing the Issue and Testing Fixes

After pinpointing the hot‑key, the author reproduced the load locally using a Kafka‑based producer ( kaf) to generate thousands of identical requests.

# Produce 10 000 messages to a single partition
cat payload.txt | kaf produce kv__0.111 -n 10000 -p 0 -b kafka-broker:9092

Initial attempts with a single producer did not generate enough pressure; scaling to multiple partitions increased load but still left CPU low. Adding concurrency (goroutine workers) raised the service’s CPU, which in turn stressed Redis.

When Redis CPU spiked, the hot‑key remained the dominant factor. The final mitigation combined three classic techniques:

Read/Write Splitting – Deploy multiple Redis instances (master‑slave) to separate read traffic.

Pipelining – Batch writes to reduce round‑trip latency.

Application‑Level Cache – Add a thin cache layer in front of Redis for the hot key.

In the author’s environment, adding a local cache layer eliminated the hot‑key hotspot, bringing both latency and CPU usage back to normal.

"A computer‑science problem can often be solved by adding an abstraction layer." – Jay Black

Key take‑aways:

Always start with basic host metrics before diving into Redis internals.

Use INFO, SLOWLOG, latency histograms, and hot‑key commands to locate the culprit.

When CPU is high but OPS is low, suspect hot‑key or big‑key processing.

Mitigate with pipelining, read/write splitting, or an additional cache layer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Redis Performance Tuning

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. General Service Troubleshooting

2. Redis‑Specific Diagnosis

2.1 Network Latency

2.2 Redis Server Health

2.3 Slow‑Log and Latency‑Histogram

2.4 Key‑Space and Big‑Key Analysis

2.5 Hot‑Key Detection (Redis ≥ 5.0)

2.6 Memory Fragmentation & Eviction

2.7 Replication & Persistence (Multi‑Instance)

3. Reproducing the Issue and Testing Fixes

ITPUB

How this landed with the community

Was this worth your time?

0 Comments

2.5 Hot‑Key Detection (Redis ≥ 5.0)