How to Diagnose and Fix Slow Redis Responses: A Step-by-Step Guide
This article walks through practical methods for troubleshooting slow service alerts, diagnosing Redis performance bottlenecks, and reproducing issues with local demos and load simulations, offering concrete metrics, command‑line checks, and mitigation strategies such as scaling, rate‑limiting, and pipeline optimization.
01 First Key Point: Basic Service Troubleshooting Methods
When an alert appears at the end of the workday, the first step is to identify which link in the processing chain has become slow. In the example, module A’s B stage showed increased latency, prompting a reverse lookup to rule out inter‑module network or bandwidth issues.
Investigate the module A itself.
Examine data volume problems.
1.1 Check basic resource data of module A
Memory normal.
CPU normal.
1.2 Check node load
Node load is normal; a single service issue rarely occurs without node problems.
1.3 Check disk usage
Storage nodes are healthy.
1.4 Recent release?
No recent deployment, so the focus shifts to data volume.
2.1 Verify data volume increase
Report volume grew five‑fold; applying scaling, rate‑limiting, and service degradation resolved the issue.
02 Second Key Point: Redis Service Troubleshooting Basics
When Redis response times are slow, the investigation follows three layers: service‑level problems, data‑storage issues, and request‑side factors.
1. Confirm whether the issue is network latency on the Redis node
Network latency, packet loss, and OS scheduling can add hundreds of microseconds even on a 1 Gbit/s link.
In this case, data‑volume reduction also reduced latency, so network problems were ruled out.
2. Test intrinsic latency and latency history
redis-cli -h 127.0.0.1 -p 6379 --intrinsic-latency 60 redis-cli -h 127.0.0.1 -p 6379 --latency-history -i 1Both commands showed no abnormal delays.
3. Examine throughput and replication metrics
# Total commands processed since last restart</code><code>total_commands_processed:2255</code><code># Instantaneous OPS</code><code>instantaneous_ops_per_sec:12</code><code># Network I/O</code><code>total_net_input_bytes:34312</code><code>total_net_output_bytes:78215</code><code># KB/s input/output</code><code>instantaneous_input_kbps:1.20</code><code>instantaneous_output_kbps:2.62Metrics indicated normal throughput; replication was not in use.
4. Check memory‑related indicators
Key metrics such as used_memory_rss_human , used_memory_peak_human , and mem_fragmentation_ratio were inspected. No excessive fragmentation or memory pressure was found.
5. Investigate key‑space health
Key count (
info keyspace) was within limits, and no big‑keys were present.
6. Look for hot‑key behavior
Hot‑key monitoring (available from Redis 5.0) revealed a few hot keys that caused CPU spikes without a corresponding increase in OPS.
7. Analyze CPU usage
Although Redis is single‑threaded, high CPU usage can stem from network I/O and hot‑key contention.
03 Third Key Point: Reproducing the Issue and Testing Basics
Two approaches are used: a local demo and an online simulation.
3.1 Local demo
Pipeline and Lua scripting were tested; perf showed fewer context switches when using pipeline.
3.2 Online simulation
Kafka data were replayed to generate load on service Y, which forwards traffic to Redis. Various scaling attempts (single producer, multiple partitions, multiple producers) were made until CPU and memory pressure on Redis matched the observed hot‑key pattern.
cat xxx-test | kaf produce kv__0.111 -n 10000 -b qapm-tencent-cp-kafka:9092 for i in {0..8}; do cat xxx-test | kaf produce kv__0.111 -n 10000 -p ${i} -b qapm-tencent-cp-kafka:9092; doneAfter adding coroutine workers to service Y, Redis CPU rose, confirming the hot‑key impact.
Mitigation
For multi‑instance deployments, use read/write separation.
For single‑instance setups, enable pipeline batch writes.
If pipeline is insufficient, add an application‑level cache.
Applying the third solution reduced latency and CPU usage dramatically.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.