Databases 8 min read

How a MySQL CPU Spike Exposed Critical Query Mis‑optimizations

An urgent overnight incident revealed a MySQL server’s CPU soaring to 400% due to poorly written queries, prompting a detailed analysis of execution plans, identification of costly operations like filesort and temporary tables, and concrete recommendations for query and team improvements.

dbaplus Community

Nov 13, 2023

How a MySQL CPU Spike Exposed Critical Query Mis‑optimizations

Background

A social‑app startup built its backend with Java and MySQL. In early 2023 the production database server experienced a CPU utilisation of 400 % during a high‑traffic weekend, causing the service to become unresponsive.

Incident Timeline

At 02:00 am the founder reported the CPU saturation. Monitoring screenshots showed the MySQL server at full CPU capacity. The backend team provided the SQL statement that was executed by a frequently accessed C‑end API endpoint.

Root‑Cause Analysis

Running the SQL on a local replica and examining the EXPLAIN output revealed the following warning flags:

Using temporary

Using filesort

Using join buffer

Block Nested Loop

These flags indicate that MySQL had to create temporary tables, perform an external sort, and use a join algorithm that scans rows without an index. The execution plan showed full table scans for the WHERE clause and no index usage, which explains the extreme CPU consumption when the dataset approached 100 million rows. The team’s claim that the query performed well under 5 million rows and only required optimisation beyond 100 million rows contradicted standard MySQL performance expectations.

Recommended Optimisations

Avoid Using filesort and Block Nested Loop by ensuring appropriate indexes on filter columns.

Eliminate Using join buffer and Using temporary by rewriting joins to use indexed columns and by limiting result sets.

Strive for Using index (index‑only scan) in the execution plan, removing the need for full table scans.

Mitigation Steps Taken

Rolled back the newly deployed feature that introduced the problematic query.

Isolated the offending SQL, rewrote it to use indexed columns, and removed unnecessary functions from the SELECT and WHERE clauses.

Adjusted related business logic to match the revised query semantics.

Performed functional testing to verify correctness.

Conducted load testing with realistic data volumes (up to 100 million rows) to confirm that CPU usage remained within acceptable limits before redeployment.

Post‑mortem Insights

The incident demonstrates that immediate service restoration must be followed by systematic diagnosis: capture monitoring data, reproduce the query on a test replica, and analyse the execution plan. Misconceptions about MySQL scaling thresholds (e.g., “optimisation is only needed after 100 million rows”) can lead to severe performance degradation. Proper query design, comprehensive indexing, and performance testing at production‑scale data sizes are essential for reliable backend services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend database mysql execution plan

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.