Diagnosing and Resolving 900% CPU Spikes in MySQL and Java Processes
This article explains common scenarios that cause CPU usage to soar above 200% in production, outlines step‑by‑step diagnosis and remediation for MySQL and Java processes, and shares real‑world case studies with command‑line tools, indexing, caching, and code adjustments to bring CPU load back to normal levels.
The article begins by noting that CPU spikes above 200% are common in production and introduces two typical cases: a MySQL process reaching 900% CPU and a Java process doing the same.
Scenario 1: MySQL CPU Spike
Investigation steps:
Use top to confirm the culprit is mysqld.
Run show processlist to identify resource‑heavy SQL statements.
Examine execution plans, missing indexes, or large data volumes.
Remediation steps:
Kill the offending threads and observe CPU reduction.
Add missing indexes, rewrite inefficient SQL, or adjust memory parameters.
Limit connection counts and enable caching (e.g., Redis) to reduce query frequency.
Iterate optimizations and monitor the effect.
Real MySQL case: A production MySQL instance hit 900% CPU due to unindexed user_code queries and excessive slow‑log logging. By adding the missing index, disabling the slow log, and moving frequent reads to Redis cache, CPU dropped to 70‑80% and later to 30‑40% after further tuning.
Scenario 2: Java CPU Spike
Investigation steps:
Identify the high‑CPU Java PID with top.
List threads using top -Hp PID.
Convert thread IDs to hex with printf "%x\n" TID and locate them in a jstack dump.
Analyze the stack trace to find the problematic code.
Remediation steps:
If a busy loop or empty spin, add Thread.sleep or proper locking.
If massive object creation triggers GC, reduce allocations or use object pools.
For selector spin loops, rebuild the selector as shown in Netty source.
Real Java case: A Java service consumed 700% CPU. Using top, top -Hp, and jstack, the offending thread was traced to ImageConverter.run(), which performed an empty‑queue poll in a tight loop. Replacing poll() with the blocking take() method eliminated the spin, reducing CPU usage to under 10%.
Both cases emphasize that proper indexing, caching, and careful thread handling are essential to prevent extreme CPU usage in backend services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
