Why Redis Became a Bottleneck: Diagnosing High CPU with Slowlog and Command Stats
A Monday morning surge in user traffic exposed a Redis performance crisis, where CPU spiked to 100% due to massive keys* commands, and the investigation using Grafana, Redis info, commandstats, and slowlog revealed the root cause and a temporary mitigation strategy.
Web Monitoring
Using Alibaba Grafana we saw normal CPU, memory, and network, so the problem was Redis.
Our single‑node 32M 16GB Alibaba Cloud Redis showed CPU spiking to 100%.
QPS rose from ~1k to 6k, connections from 0 to 3k, but still far below limits; the latency was caused by a massive command queue.
Temporary solution: provision a new Redis instance and switch the application configuration.
Server Command Monitoring
Running info and checking slowlog revealed that the top ten slow commands were keys *, which blocks the service under current traffic.
Further inspection of command statistics showed extremely high average latencies for commands such as setnx (6 s), setex (7.33 s), del (69 s), hmset (64 s), hmget (9 s), hgetall (205 s), and especially keys (3740 s).
These latencies correlate with the size of the values, so recent data growth or code changes that issue these commands should be investigated.
Command statistics can be viewed via info commandstats, which reports calls, usec, and usec_per_call.
cmdstat_XXX: calls=XXX,usec=XXX,usec_per_call=XXXThe slowlog records commands taking longer than 10 ms (excluding network I/O). Example output:
xxxxx> slowlog get 10
3) 1) (integer) 411
2) (integer) 1545386469
3) (integer) 232663
4) 1) "keys"
2) "mecury:*"Fields represent log ID, timestamp, execution time (µs), and the command array.
Thus a sudden surge of keys * commands caused the CPU spike and latency. The command was not intended to be exposed by our application.
After sharing the stats with the development team, we discovered another application had mistakenly pointed to our Redis and was crawling data with massive keys * calls. The configuration was corrected.
Summary
Check web monitoring dashboards first.
Inspect Redis command stats and slowlog to identify heavy commands.
Optimize Redis usage in code.
Consider scaling Redis if traffic continues to grow.
Source: https://www.sevenyuan.cn
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
