How to Diagnose and Resolve Online Slow SQL That Causes CPU Spikes
When a MySQL‑driven application shows sudden CPU spikes, this guide walks through a step‑by‑step process—using top/htop, SHOW PROCESSLIST, enabling slow‑query logs, analyzing EXPLAIN output, killing offending queries, adding appropriate indexes, rewriting joins, and establishing preventive monitoring—to quickly identify and fix the root cause.
Introduction
A recent interview at a major tech company asked how to handle an online slow‑SQL problem that caused CPU usage to soar. This article walks through the complete troubleshooting workflow from detection to optimization.
1. Quickly Locate the Problem
1.1 Verify CPU Consumption at the Database Layer
Log into the database server and run top or htop to check the MySQL process CPU usage. If mysqld exceeds 100% on a multi‑core machine, the database is likely the culprit.
Enter the MySQL client and execute: SHOW PROCESSLIST; Focus on the Time (execution time) and State columns. Sessions showing states such as Sending data, Copying to tmp table, or Sorting result with long durations are strong indicators of slow SQL.
Enable the slow‑query log if it is not already on:
SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 1; -- record queries > 1 sAnalyze the log with mysqldumpslow or pt‑query‑digest:
pt-query-digest /var/log/mysql/slow.log # or mysqldumpslow1.2 Extract the Problematic SQL
Copy the offending statement from SHOW PROCESSLIST or from the slow‑query log. Example:
SELECT o.id, o.amount, u.name, p.title
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
LEFT JOIN products p ON o.product_id = p.id
WHERE o.status = 'PAID'
AND o.create_time > '2026-01-01'
ORDER BY o.amount DESC
LIMIT 1000;2. Why Is It Slow?
The root cause is that MySQL must spend a lot of CPU on full table scans, sorting, and temporary table creation.
2.1 Execution‑Plan Analysis
Run EXPLAIN SELECT … and examine the key columns: type: ALL means a full table scan (worst); range or ref indicate index usage. rows: estimated number of rows scanned; larger values mean slower queries. Extra: Using filesort or Using temporary are CPU‑intensive.
2.2 Why Full Table Scans Consume CPU
MySQL reads each data page into memory and evaluates the filter row by row.
For very large tables (millions of rows), even with enough memory the CPU is busy parsing and comparing.
Sorting and grouping add extra CPU load.
2.3 Common Index‑Failure Scenarios
Applying functions to indexed columns, e.g., WHERE DATE(create_time) = '2026-01-01'.
Implicit type conversion, e.g., WHERE user_id = '123' when user_id is numeric.
Using != or <> operators.
Leading‑wildcard LIKE '%abc' patterns.
OR conditions that involve non‑indexed columns.
3. Emergency Measures (Stop‑Gap)
3.1 Kill the Slow Query
SHOW PROCESSLIST;
-- Find the Id of the long‑running session and execute:
KILL <Id>;Scripts can be written to automatically kill sessions that exceed a threshold.
3.2 Temporary Rate Limiting
Apply rate limiting on suspicious API endpoints at the application layer, e.g., using Sentinel or Hystrix.
3.3 Restarting the Database (Not Recommended)
Restarting clears the buffer pool and may make the problem worse unless the server is completely unresponsive.
4. Permanent Fix: SQL and Index Optimization
4.1 Add Appropriate Indexes
For the example query, a composite index on (status, create_time) and possibly the amount column can help:
ALTER TABLE orders ADD INDEX idx_status_time_amount (status, create_time, amount);After the change, EXPLAIN shows type=range, a dramatically reduced rows estimate, and no Using filesort in Extra.
4.2 Rewrite the SQL
Avoid SELECT *; select only needed columns.
Replace LEFT JOIN with INNER JOIN when the joined tables are guaranteed to have matching rows.
Use subqueries or temporary tables to reduce the amount of data joined.
4.3 Split Complex Queries
Break a multi‑table join into several simpler queries and assemble the results in the application when the total data volume is moderate.
5. Real‑World Case Study
Original SQL (orders 5 M, users 2 M, products 1 M):
SELECT o.order_no, u.phone, p.name, o.amount
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
LEFT JOIN products p ON o.product_id = p.id
WHERE o.status = 1
AND o.create_time BETWEEN '2026-04-01' AND '2026-04-30'
ORDER BY o.amount DESC
LIMIT 100;Problems
The orders table only has a single‑column index on status, so create_time is not indexed and the query scans all rows with status 1. ORDER BY amount triggers a filesort.
Optimization Steps
Create a composite index:
ALTER TABLE orders ADD INDEX idx_status_time_amount(status, create_time, amount);Rewrite LEFT JOIN to INNER JOIN because the related user and product rows always exist.
Pagination is already limited to LIMIT 100, which is acceptable.
Result : After the changes, type=range, rows≈2000, and Extra no longer contains Using filesort. Query time dropped from 30 s to 0.08 s, and database CPU fell from 85 % to 15 %.
6. Pros, Cons, and Suitable Scenarios of Each Technique
Add Index – Immediate effect; downside is extra write overhead and disk usage; best for high‑selectivity, high‑frequency queries.
Rewrite SQL – No schema change; requires deep business understanding; suited for complex joins or subqueries.
Split Queries + Application Assembly – Reduces DB load; adds network overhead and code complexity; works when many tables are involved but each table is relatively small.
Use Cache – Greatly lowers DB load; consistency can be hard; ideal for read‑heavy, write‑light hot data.
Read‑Write Splitting – Distributes read pressure; introduces replication lag; fits read‑dominant workloads.
7. Preventing Slow‑SQL‑Induced CPU Spikes
Establish an SQL review process: every statement must pass an EXPLAIN check before deployment; forbid full‑table scans.
Enable slow‑query monitoring with a threshold (e.g., 1 s) and integrate alerts.
Regularly audit index usage: drop unused indexes and consolidate duplicates.
Conduct load testing on critical queries before major traffic events.
Apply rate limiting and degradation strategies at API gateways or business layers.
Deploy a full‑stack performance testing platform to catch potential slow queries early.
8. Conclusion
Online slow SQL leads to CPU spikes because inefficient queries exhaust database resources. The core remediation workflow is:
Locate slow SQL → Analyze with EXPLAIN → Optimize via indexing or SQL rewrite → Verify improvement → Build preventive mechanisms.
In practice, about 80 % of CPU‑spike incidents can be resolved by adding the right index or simplifying the query, but disciplined monitoring and review are essential for long‑term stability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
