Databases 24 min read

Redis Latency Mystery: How to Diagnose and Fix Slow Redis Calls

When a Redis‑backed service suddenly experiences high response times, this guide walks through systematic troubleshooting steps—checking server resources, network latency, Redis metrics, slow‑log, memory fragmentation, big keys, and hot keys—culminating in practical solutions such as pipelining, read/write splitting, and cache layering.

ITPUB
ITPUB
ITPUB
Redis Latency Mystery: How to Diagnose and Fix Slow Redis Calls

During a routine shift, an alert indicated that a critical module’s Redis calls were taking far longer than usual. The author shares a complete, step‑by‑step debugging workflow that can be applied to any Redis‑based service.

1. General Service Troubleshooting

Start by inspecting the basic health of the host running the service:

Check memory usage and CPU load.

Verify node‑level load (CPU, memory, network I/O, disk I/O).

Confirm that the service’s own metrics (e.g., instantaneous_ops_per_sec) are within expected ranges.

If these indicators look normal, move on to the data volume. A sudden increase (e.g., 5×) often points to a downstream bottleneck.

2. Redis‑Specific Diagnosis

Redis can be slow for several reasons. The following checklist helps isolate the root cause.

2.1 Network Latency

Measure round‑trip latency between the client and the Redis node. Typical values are ~200 µs for 1 Gbps TCP and ~30 µs for Unix‑domain sockets. High latency may stem from poor network quality, packet loss, or virtual‑machine overhead.

Redis latency diagram
Redis latency diagram

2.2 Redis Server Health

Use INFO commands to inspect CPU, memory, and I/O:

# Total commands processed since start
total_commands_processed:2255
# Real‑time OPS
instantaneous_ops_per_sec:12
# Network I/O
total_net_input_bytes:34312
total_net_output_bytes:78215
instantaneous_input_kbps:1.20
instantaneous_output_kbps:2.62

CPU usage near 90 % often signals a hot‑key or a blocking operation, while memory and disk I/O should remain stable.

CPU and memory metrics
CPU and memory metrics

2.3 Slow‑Log and Latency‑Histogram

Check the slow‑log to see which commands exceed the threshold (default 1 s): redis-cli -h 127.0.0.1 -p 6379 SLOWLOG GET Also run a latency histogram to observe min/avg/max values:

redis-cli -h 127.0.0.1 -p 6379 --latency-history -i 1
Slow‑log output
Slow‑log output

2.4 Key‑Space and Big‑Key Analysis

Inspect the total number of keys (should be < 10 k for a single instance) and look for oversized keys: redis-cli -h 127.0.0.1 -p 6379 INFO keyspace Detect big keys with the built‑in scanner:

redis-cli -h 127.0.0.1 -p 6379 --bigkeys -i 0.01

If no big keys are found, the issue likely lies elsewhere.

Big‑key scan result
Big‑key scan result

2.5 Hot‑Key Detection (Redis ≥ 5.0)

Enable an LFU eviction policy and query hot keys:

CONFIG SET maxmemory-policy allkeys-lfu
redis-cli -h 127.0.0.1 -p 6379 HOTKEYS
Hot‑key list
Hot‑key list

In the case study, a hot key caused CPU to spike while OPS remained modest, confirming that the hot‑key was the bottleneck.

2.6 Memory Fragmentation & Eviction

Check fragmentation ratio ( used_memory_rss / used_memory). Values > 1.5 indicate severe fragmentation and possible swapping.

# Example output
used_memory_rss_human:1.2G
used_memory_peak_human:1.5G
mem_fragmentation_ratio:1.23

2.7 Replication & Persistence (Multi‑Instance)

For clustered setups, monitor master_link_status, master_last_io_seconds_ago, and eviction counters ( evicted_keys).

Replication status
Replication status

3. Reproducing the Issue and Testing Fixes

After pinpointing the hot‑key, the author reproduced the load locally using a Kafka‑based producer ( kaf) to generate thousands of identical requests.

# Produce 10 000 messages to a single partition
cat payload.txt | kaf produce kv__0.111 -n 10000 -p 0 -b kafka-broker:9092

Initial attempts with a single producer did not generate enough pressure; scaling to multiple partitions increased load but still left CPU low. Adding concurrency (goroutine workers) raised the service’s CPU, which in turn stressed Redis.

Load test with goroutine workers
Load test with goroutine workers

When Redis CPU spiked, the hot‑key remained the dominant factor. The final mitigation combined three classic techniques:

Read/Write Splitting – Deploy multiple Redis instances (master‑slave) to separate read traffic.

Pipelining – Batch writes to reduce round‑trip latency.

Application‑Level Cache – Add a thin cache layer in front of Redis for the hot key.

In the author’s environment, adding a local cache layer eliminated the hot‑key hotspot, bringing both latency and CPU usage back to normal.

Performance after cache layer
Performance after cache layer
"A computer‑science problem can often be solved by adding an abstraction layer." – Jay Black

Key take‑aways:

Always start with basic host metrics before diving into Redis internals.

Use INFO, SLOWLOG, latency histograms, and hot‑key commands to locate the culprit.

When CPU is high but OPS is low, suspect hot‑key or big‑key processing.

Mitigate with pipelining, read/write splitting, or an additional cache layer.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

redisperformance tuning
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.