How to Tame MySQL CPU Spikes: A Complete 4‑Step Emergency Guide
When MySQL CPU usage spikes to 500%, this guide walks you through a four‑step emergency process—quickly stopping the overload, diagnosing the root cause, applying targeted SQL and configuration optimizations, and setting up monitoring to prevent future spikes—ensuring service stability and performance.
MySQL CPU Spike Scenario
When MySQL CPU usage jumps to 500%, the database is under extreme pressure and applications may experience timeouts or blocking. The handling principle is "stop loss first, then cure" and can be summarized as a four‑step strategy: emergency stop‑bleeding, root‑cause investigation, targeted optimization, and preventive measures.
Step 1 – Emergency Stop‑Bleeding
Goal: Reduce CPU usage and restore service availability.
Locate high‑CPU processes
Use top -Hp $(pidof mysqld) or htop to confirm that mysqld is consuming a lot of CPU.
Identify and kill problematic sessions
SHOW FULL PROCESSLIST; KILL <process_id>;Prioritize sessions with a long Time value, abnormal State (e.g., Sending data, Copying to tmp table, Sorting result, locked), or complex queries shown in the Info column.
Killing the top few resource‑hungry sessions usually brings CPU down dramatically.
Emergency scaling (cloud environments)
Temporarily increase instance CPU/IOPS to relieve pressure.
After the issue is resolved, downscale to control costs.
Step 2 – Deep Investigation and Root‑Cause Identification
Goal: Find the underlying cause of the CPU spike to prevent recurrence.
Check slow‑query log settings
SHOW VARIABLES LIKE 'slow_query_log%'; SHOW VARIABLES LIKE 'long_query_time';Analyze slow queries
mysqldumpslow -s r -t 10 /path/to/slow.log # most frequent queries mysqldumpslow -s c -t 10 /path/to/slow.log # most rows examined mysqldumpslow -s t -t 10 -g "LEFT JOIN" /path/to/slow.log # pattern searchIt is recommended to use Percona Toolkit for deeper analysis:
pt-query-digest /path/to/slow.log > slow_report.txtReal‑time diagnostics
SHOW FULL PROCESSLIST; SHOW GLOBAL STATUS LIKE 'Handler%'; SHOW GLOBAL STATUS LIKE 'Threads_running'; SHOW GLOBAL STATUS LIKE 'Sort%'; SHOW GLOBAL STATUS LIKE 'Innodb_rows_read%';Pay attention to thread count, sorting, temporary tables, and read/write counters.
Determine load type
CPU‑intensive: Sorting, grouping, or join queries that may cause full‑table scans.
IO‑intensive: Insufficient memory leading to frequent disk reads/writes (watch %iowait).
Step 3 – Targeted Optimization
Goal: Eliminate the performance bottleneck.
1. SQL Optimization
Add indexes on columns used in WHERE, ORDER BY, GROUP BY, and JOIN clauses.
Avoid SELECT * in production queries.
Break complex joins or sub‑queries into simpler statements.
Regularly review slow‑query logs to prevent problematic SQL from reaching production.
2. Database Configuration Tuning
InnoDB buffer pool SHOW VARIABLES LIKE 'innodb_buffer_pool_size'; Set to 50‑70% of available memory.
Temporary table and sort buffers
tmp_table_size max_heap_table_size sort_buffer_sizeAvoid excessive on‑disk temporary tables to reduce CPU and I/O pressure.
3. Architecture & Business Optimizations
Cache hot data with Redis or Memcached.
Implement read/write splitting; run heavy reporting queries on read‑only replicas.
Apply business‑level throttling to limit non‑core request traffic.
Archive historical data to shrink table size and improve query efficiency.
Step 4 – Preventive Mechanisms
Goal: Avoid future CPU spikes.
Monitoring & Alerts
Metrics: CPU usage, Threads_running, slow‑query count, TPS/QPS.
Tools: Prometheus + Grafana with appropriate alert rules.
SQL Review Conduct performance reviews before deployment using tools such as Archery or Yearning.
Stress Testing Simulate high‑concurrency loads before major releases or promotional events.
Practical Tips
Run EXPLAIN regularly to inspect execution plans of hot queries.
Enable log_queries_not_using_indexes to spot queries that bypass indexes.
Use performance_schema for more precise real‑time analysis than SHOW PROCESSLIST.
In cloud environments, combine read/write splitting with elastic scaling for rapid pressure relief.
Adjust connection pool size and thread count in high‑concurrency scenarios to avoid sudden CPU peaks.
Conclusion
CPU spikes are symptoms, not root causes. Follow the four‑step "emergency" workflow: stop the bleed to restore service, investigate to locate the high‑CPU SQL or operation, cure the root cause with SQL, configuration, and architectural tweaks, and finally set up monitoring, review, and stress‑testing to prevent recurrence.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ray's Galactic Tech
Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
