Why MySQL CPU Spiked to 400%: A Real‑World SQL Optimization Case Study
A startup’s production MySQL server hit 400% CPU usage due to poorly written queries with temporary tables, filesorts, and join buffers, prompting a deep dive into execution plans, concrete optimization recommendations, and a rollback strategy to restore service.
Background
A social‑media application experienced a production incident where the MySQL database server CPU spiked to 400%, making the service unresponsive during a weekend promotion.
Incident Details
The monitoring system showed sustained high CPU usage. A problematic SQL statement from a C‑end API was collected together with the full table schema, and the execution plan was examined with EXPLAIN.
Execution‑Plan Analysis
The plan contained the following costly operations:
Using temporary – MySQL created an on‑disk temporary table, adding I/O and CPU overhead.
Using filesort – Sorting was performed without a suitable index, requiring an extra pass over the data.
Using join buffer – The join could not use an indexed lookup, so a buffer‑based nested‑loop join was used.
Block Nested Loop – Large portions of the tables were scanned repeatedly, indicating missing indexes.
These patterns indicate that the query lacks appropriate indexes and forces MySQL to process rows in memory, leading to the CPU saturation.
Optimization Recommendations
To reduce CPU consumption the following concrete actions were recommended:
Eliminate Using filesort and Block Nested Loop by adding covering indexes that match the WHERE and ORDER BY columns.
Remove functions (e.g., DATE(), LIKE '%…') from the WHERE clause or replace them with indexed columns/pre‑computed values.
Limit result sets with LIMIT or pagination to avoid large temporary tables.
If a join cannot be satisfied with indexes, consider denormalising the data or using a materialised view.
Aim for an execution plan that shows Using index (covering index) so MySQL reads directly from the index without touching the base table.
Resolution Process
A read‑only Git account was created, the repository was cloned, and a code audit revealed the same anti‑pattern across many modules. Because the issue was widespread, the team rolled back the latest release and then:
Refactored the identified queries according to the recommendations above.
Created or adjusted indexes on the affected tables.
Implemented caching (e.g., Redis) for frequently accessed data.
Executed load‑testing scripts (e.g., sysbench or JMeter) to verify that CPU usage remained within safe limits.
Redeployed the corrected version only after performance criteria were met.
Post‑mortem Lessons
1. Restoring service is the top priority; performance investigation follows.
2. MySQL performance can degrade on very small tables if queries are poorly written—inefficient queries on a few thousand rows can still saturate CPU.
3. Benchmark claims must be backed by clear specifications of hardware, dataset size, and workload; vague statements lead to mis‑diagnosis.
4. Regularly review execution plans in production and enforce index‑first query design to prevent Using temporary , Using filesort , and related patterns.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
