How to Diagnose and Resolve MySQL CPU Spikes: A Complete Step‑by‑Step Guide
This guide walks you through identifying why MySQL CPU usage jumps, from confirming the MySQL process consumes CPU to checking connection counts, slow queries, lock waits, configuration settings, and business‑level traffic, and then provides short‑term mitigations and long‑term solutions such as read‑write splitting, sharding, and caching.
1. Identify and Confirm the Issue
1.1 Verify CPU usage at the system level
Check that the MySQL process is actually consuming CPU and rule out other processes:
# View MySQL process CPU usage
top -bn1 | grep mysql
ps aux | grep mysql
# View CPU usage trend for MySQL
sar -u 1 10
# View per‑CPU core usage
mpstat -P ALL 1 5
# View system load
uptimeIf MySQL’s CPU usage is low, the problem may lie elsewhere (e.g., web or app servers). If it is high, continue the investigation.
1.2 Quick check of MySQL internal status
Log into MySQL and run the following statements to get key metrics:
-- Current connections
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Max_used_connections';
-- Running queries count
SHOW STATUS LIKE 'Threads_running';
-- Query cache hit rate (MySQL 5.7 and earlier)
SHOW STATUS LIKE 'Qcache%';
-- InnoDB buffer pool status
SHOW STATUS LIKE 'Innodb_buffer_pool%';
-- Lock wait information
SHOW ENGINE INNODB STATUS\G;
-- Transactions and locks
SELECT * FROM information_schema.INNODB_TRX;
SELECT * FROM information_schema.INNODB_LOCK_WAITS;The Threads_running metric is crucial; values consistently higher than 2‑3 × the number of CPU cores indicate the server is overloaded.
2. Investigate Connection‑Related Issues
2.1 Excessive connections causing high CPU
High concurrency creates many connections, which itself consumes CPU:
-- List all connections
SHOW PROCESSLIST;
SHOW FULL PROCESSLIST;
-- Connections per user
SELECT user, COUNT(*) FROM information_schema.processlist GROUP BY user;
-- Connections per host
SELECT host, COUNT(*) FROM information_schema.processlist GROUP BY host;
-- Max connections setting
SHOW VARIABLES LIKE 'max_connections';
-- Connection memory usage
SHOW STATUS LIKE 'Connection_errors%';Temporary remediation:
-- Raise max connections temporarily
SET GLOBAL max_connections = 2000;
-- Kill idle connections (sleep > 1 hour)
SELECT CONCAT('KILL ', id, ';') FROM information_schema.processlist WHERE Command='Sleep' AND Time>3600 INTO OUTFILE '/tmp/kill_connections.sql';
SOURCE /tmp/kill_connections.sql;
-- Persist the change in my.cnf
[mysqld]
max_connections = 2000
wait_timeout = 600
interactive_timeout = 6002.2 Insufficient connection pooling
If the application does not reuse connections, each request creates a new one, adding CPU load:
# Check connection creation rate
SHOW STATUS LIKE 'Connections';
# Check aborted connections
SHOW STATUS LIKE 'Aborted%';A high Aborted_connects value suggests many failed connection attempts, possibly due to authentication or permission problems.
3. Slow Queries Driving CPU Usage
3.1 Find the most CPU‑intensive queries
# List currently running queries
SHOW FULL PROCESSLIST;
# Sort by execution time to locate the longest‑running queries
SELECT ID, USER, HOST, DB, COMMAND, TIME, STATE, LEFT(INFO,100) AS Query
FROM information_schema.processlist
WHERE COMMAND!='Sleep'
ORDER BY TIME DESC
LIMIT 10;3.2 Use Performance Schema for deeper analysis (MySQL 5.6+)
# Enable instrumentation
UPDATE performance_schema.setup_instruments SET ENABLED='YES' WHERE NAME LIKE 'statement/%';
UPDATE performance_schema.setup_consumers SET ENABLED='YES' WHERE NAME LIKE 'events_statements%';
# Show top CPU‑consuming statements (MySQL 8.0+)
SELECT SCHEMA_NAME, DIGEST, COUNT_STAR,
SUM_TIMER_WAIT/1000000000000 AS total_seconds,
AVG_TIMER_WAIT/1000000000000 AS avg_seconds,
SUBSTR(DIGEST_TEXT,1,200) AS query
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME='your_database'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;3.3 Typical CPU‑heavy query patterns
Full table scans are common culprits:
# Identify possible full scans
EXPLAIN SELECT * FROM orders WHERE status='pending';
# If the type column shows ALL, create an appropriate index
CREATE INDEX idx_orders_status ON orders(status);Sorting operations also consume CPU; check for filesort usage:
# Find queries that trigger filesort
SELECT COUNT(*) AS execution_count,
SUM_TIMER_WAIT/1000000000000 AS total_time,
AVG_TIMER_WAIT/1000000000000 AS avg_time,
SUBSTR(DIGEST_TEXT,1,100) AS query
FROM performance_schema.events_statements_summary_by_digest
WHERE DIGEST_TEXT LIKE '%ORDER BY%'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;4. Lock‑Wait Issues
4.1 Locks causing CPU spin
# Show current lock waits
SELECT r.trx_id AS waiting_trx_id, r.trx_mysql_thread_id AS waiting_thread,
r.trx_query AS waiting_query, b.trx_id AS blocking_trx_id,
b.trx_mysql_thread_id AS blocking_thread, b.trx_query AS blocking_query,
b.trx_started AS blocking_started, b.trx_rows_locked AS blocking_rows_locked
FROM information_schema.INNODB_LOCK_WAITS w
JOIN information_schema.INNODB_TRX b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.INNODB_TRX r ON r.trx_id = w.requesting_trx_id;4.2 Long‑running transactions
# Transactions running for a long time
SELECT trx_id, trx_mysql_thread_id, trx_state, trx_started,
TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS running_seconds,
trx_rows_locked, trx_query
FROM information_schema.INNODB_TRX
ORDER BY trx_started;
# Transactions running > 10 minutes
SELECT trx_mysql_thread_id, trx_started,
TIMESTAMPDIFF(MINUTE, trx_started, NOW()) AS running_minutes,
trx_query
FROM information_schema.INNODB_TRX
WHERE TIMESTAMPDIFF(MINUTE, trx_started, NOW()) > 10;4.3 Force‑kill lock‑waiting transactions (use with caution)
# List connections to identify the culprit
SHOW PROCESSLIST;
# Kill a specific thread (example thread ID 12345)
KILL 12345;
# Kill all long‑running transactions (>30 min)
SELECT CONCAT('KILL ', trx_mysql_thread_id, ';')
FROM information_schema.INNODB_TRX
WHERE TIMESTAMPDIFF(MINUTE, trx_started, NOW()) > 30
INTO OUTFILE '/tmp/kill_long_trx.sql';
SOURCE /tmp/kill_long_trx.sql;5. Configuration Checks
5.1 Buffer pool sizing
# Current InnoDB buffer pool settings
SHOW VARIABLES LIKE 'innodb_buffer_pool%';
SHOW STATUS LIKE 'Innodb_buffer_pool%';
# Example: set buffer pool to 16 GB (≈70‑80 % of RAM)
SET GLOBAL innodb_buffer_pool_size = 17179869184;5.2 Thread‑related settings
# Thread cache size
SHOW VARIABLES LIKE 'thread_cache_size';
# Threads created count
SHOW STATUS LIKE 'Threads_created';
# Adjust thread cache
SET GLOBAL thread_cache_size = 64;
# I/O thread counts (SSD environments may benefit from higher values)
SHOW VARIABLES LIKE 'innodb_write_io_threads';
SHOW VARIABLES LIKE 'innodb_read_io_threads';
SET GLOBAL innodb_write_io_threads = 16;
SET GLOBAL innodb_read_io_threads = 16;5.3 Log configuration
# Log settings
SHOW VARIABLES LIKE 'innodb_log%';
# Recommended log file size: 256 M‑1 G
SHOW VARIABLES LIKE 'innodb_log_file_size';
# Flush policy (affects durability vs. performance)
SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';
-- 1 = safest (flush on every commit)
-- 2 = OS cache flush (balanced)
-- 0 = flush once per second (fastest, possible 1 s data loss)6. Business‑Level Analysis
6.1 Determine whether traffic is normal or abnormal
# MySQL QPS
mysqladmin ext -i 1 | grep Questions;
# SELECT queries per second
mysqladmin ext -i 1 | grep "Com_select";If the spike aligns with legitimate growth, consider architectural changes such as read‑write splitting or sharding. If it appears abnormal, investigate possible attacks or crawlers.
6.2 Analyze request characteristics
# Most frequently accessed tables
SELECT OBJECT_SCHEMA, OBJECT_NAME, COUNT_STAR,
SUM_TIMER_WAIT/1000000000000 AS total_seconds
FROM performance_schema.objects_summary_global_by_type
WHERE OBJECT_TYPE='TABLE'
ORDER BY COUNT_STAR DESC
LIMIT 20;
# Most used indexes
SELECT OBJECT_SCHEMA, OBJECT_NAME, INDEX_NAME, COUNT_STAR,
SUM_TIMER_WAIT/1000000000000 AS total_seconds
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE INDEX_NAME IS NOT NULL
ORDER BY COUNT_STAR DESC
LIMIT 20;6.3 Source IP analysis
# Show connection source IPs
mysql -e "SHOW PROCESSLIST" | awk '{print $3}' | grep -v "localhost" | sort | uniq -c | sort -rn;7. Emergency Mitigation
7.1 Short‑term relief
# Disable non‑essential features temporarily
SET GLOBAL read_only = ON;
SET GLOBAL super_read_only = ON;
# Kill the top CPU‑consuming queries (limit 5)
SELECT CONCAT('KILL QUERY ', ID, ';')
FROM information_schema.processlist
WHERE COMMAND!='Sleep'
ORDER BY TIME DESC
LIMIT 5;7.2 Connection throttling
# Reduce max connections (MySQL 8.0+)
SET GLOBAL max_connections = 500;
# Per‑user limits must be enforced at the application or via a proxy (e.g., MySQL Router, ProxySQL)7.3 Query throttling
# Lower slow‑query threshold to prevent new heavy queries
SET GLOBAL long_query_time = 0.5;
# With Percona Toolkit you can create rules to reject specific patterns8. Root‑Cause Solutions
8.1 Read‑Write Splitting
Offload read traffic to replicas to reduce primary CPU load:
# Check replica sync status
SHOW SLAVE STATUS\G;
# Use MySQL Router or ProxySQL for automatic routing8.2 Sharding / Partitioning
For very large tables, split them horizontally (e.g., by month) or use middleware such as Vitess or TiDB.
8.3 Caching Layer
Introduce an external cache (Redis, Memcached) for hot data, session information, and aggregated statistics to cut down query volume.
9. Preventive Measures
9.1 Monitoring & Alerting
# Example: Prometheus + mysqld_exporter
# Key metrics to watch:
# - mysql_global_status_threads_connected
# - mysql_global_status_threads_running
# - rate(mysql_global_status_questions[5m])
# - rate(mysql_global_status_slow_queries[5m])9.2 Regular Slow‑Query Review
# Daily slow‑query digest
pt-query-digest --since '24h' /var/log/mysql/slow.log > /tmp/slow_query_report.txt9.3 Capacity Planning
Track CPU and connection trends, then proactively scale hardware or upgrade architecture before limits are hit.
10. Conclusion
MySQL CPU spikes require a systematic approach: confirm the MySQL process is the culprit, examine internal metrics (connections, running queries, lock waits), analyze query execution plans, and verify configuration parameters. Business‑level traffic analysis helps decide whether scaling or optimization is needed. Implementing robust monitoring, slow‑query governance, and architectural safeguards such as read‑write splitting, sharding, and caching prevents future spikes.
MySQL 8.0 Reference Manual – InnoDB Performance
MySQL 8.0 Reference Manual – Performance Schema
Percona Blog – MySQL CPU Performance
SHOW ENGINE INNODB STATUS documentation
MySQL connection management and optimization guide
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
