How to Diagnose and Fix Slow SQL That Spikes CPU in a Production Interview
The article walks through a systematic approach to identify why a slow MySQL query can cause CPU usage to soar, covering observability, analysis with SHOW PROCESSLIST and EXPLAIN, and practical mitigation steps suitable for interview discussions.
Why a Single SQL Can Drive CPU Up
When a query takes long to execute, the database spends CPU on parsing, planning, sorting, and moving large result sets in memory; concurrent sessions each consume CPU time slices, and lock waits can trigger retries, filling connection pools and blocking application threads, which together raise CPU usage on both DB and app sides.
First Step: Pinpoint the Symptom
Check whether the CPU rise follows QPS growth (likely traffic) or occurs while QPS stays flat (indicates slow SQL, hot rows, or bad plans). Also glance at disk I/O, network, and JVM GC to avoid blaming the database alone.
Gather Evidence with Observability Tools
Use SHOW PROCESSLIST (or equivalent monitoring panels) to find sessions with long Time, examine their State (e.g., Sending data vs Creating sort index ) and the Info field for the SQL fingerprint.
Analyze the Slow Query
Run EXPLAIN or EXPLAIN ANALYZE (depending on MySQL version) to see whether the plan does a full table scan ( type=ALL), uses the wrong index, has an inflated estimated row count, triggers filesort, creates temporary tables, or suffers from a mis‑written index prefix.
-- Focus on plan columns: type, key, rows, Extra
EXPLAIN SELECT id, name
FROM orders
WHERE user_id = 12345 AND status = 'PAID'
ORDER BY created_at DESC
LIMIT 20;The example shows a query that filters by user_id and status then orders by created_at. If the plan does not match business intuition, possible causes are missing indexes, stale statistics, or query patterns that force the optimizer to choose a bad path.
Immediate Mitigation (Bleeding‑Control)
Kill long‑running sessions after confirming they can be safely terminated.
Reduce concurrency: circuit‑break, rate‑limit, downgrade reads, or route traffic to read‑only replicas.
Add caching where appropriate, remembering that cache solves read amplification but does not legitimize a bad query.
Root‑Cause Fixes (Long‑Term Cure)
Create suitable composite indexes and avoid SELECT * patterns.
Replace deep pagination with cursor‑based paging or delayed joins.
Batch inserts with controlled batch size; split large queries into smaller ones.
Run ANALYZE TABLE after major version upgrades or when optimizer switches behave unexpectedly.
Tune parameters such as join_buffer, which can alter execution plans under load.
Never change execution plans in production without observability; unexpected plan changes are often the real culprit behind sudden CPU spikes.
Interview Insight
The underlying test evaluates whether you can close the loop: observe the symptom, collect evidence, and propose concrete changes. Mention that the hardest cases are not missing indexes but plans that were fine yesterday and broke today due to stale statistics, histogram shifts, hot data, or implicit type conversions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
