Is MySQL CPU Spike a Database Issue or an Application Issue? Troubleshooting Guide
When MySQL CPU usage spikes above 80% or hits 100%, this guide walks you through a systematic investigation—from confirming the MySQL process consumes CPU, checking system and MySQL status, analyzing connection counts, slow queries, lock waits, and configuration settings, to applying short‑term mitigations and long‑term architectural fixes.
Background and Problem
MySQL CPU usage soaring above 80% or even reaching 100% is a common emergency in production. If the alert triggers, business response slows or times out, and rapid root‑cause identification is essential to avoid service outage.
Operators often fall into two extremes: blaming the database and blindly adding hardware or restarting, or blaming the application and demanding immediate SQL optimizations. Both are unprofessional; the real cause can be misconfiguration, a single slow query, excessive concurrent connections, lock‑wait spin, or legitimate traffic growth.
This article examines the issue from system tools, MySQL status, and query analysis perspectives.
1. Discover and Confirm the Problem
1.1 Verify CPU Usage at the System Level
First ensure the MySQL process is actually consuming CPU, ruling out other processes:
# View MySQL process CPU usage
top -bn1 | grep mysql
ps aux | grep mysql
# View CPU usage trend for MySQL
sar -u 1 10
# View per‑CPU core usage
mpstat -P ALL 1 5
# View system load
uptimeIf MySQL's CPU share is low, the problem likely lies elsewhere (e.g., web or app servers). If high, continue the investigation.
1.2 Quick Internal State Check
Log into MySQL and run quick status queries:
# Show current connections
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Max_used_connections';
# Show currently running queries
SHOW STATUS LIKE 'Threads_running';
# Show query cache hit rate (MySQL 5.7 and earlier)
SHOW STATUS LIKE 'Qcache%';
# Show InnoDB buffer pool status
SHOW STATUS LIKE 'Innodb_buffer_pool%';
# Show lock wait information
SHOW ENGINE INNODB STATUS\G
# Show transaction and lock details
SELECT * FROM information_schema.INNODB_TRX;
SELECT * FROM information_schema.INNODB_LOCK_WAITS; Threads_runningis a key indicator; values consistently exceeding 2‑3× the number of CPU cores signal overload.
2. Concurrent Connection Issues
2.1 Too Many Connections
High concurrency itself consumes CPU:
# List all connections
SHOW PROCESSLIST;
SHOW FULL PROCESSLIST;
# Count connections per user
SELECT user, COUNT(*) FROM information_schema.processlist GROUP BY user;
# Count connections per host
SELECT host, COUNT(*) FROM information_schema.processlist GROUP BY host;
# View max_connections setting
SHOW VARIABLES LIKE 'max_connections';
# View connection‑related memory usage
SHOW STATUS LIKE 'Connection_errors%';Solution:
# Temporarily raise max_connections
SET GLOBAL max_connections = 2000;
# Kill idle connections (Sleep > 1h)
SELECT CONCAT('KILL ', id, ';') FROM information_schema.processlist WHERE Command='Sleep' AND Time > 3600 INTO OUTFILE '/tmp/kill_connections.sql';
SOURCE /tmp/kill_connections.sql;
# Persist configuration
[mysqld]
max_connections = 2000
wait_timeout = 600
interactive_timeout = 6002.2 Insufficient Connection Reuse
If the application does not use a connection pool, each request creates a new connection, wasting CPU on handshakes:
# Check connection creation rate
SHOW STATUS LIKE 'Connections';
# Check aborted connections
SHOW STATUS LIKE 'Aborted%';
# Average new connections per second
SHOW STATUS LIKE 'Connections' LIKE '%s';3. Slow Queries Causing High CPU
3.1 Identify the Most CPU‑Intensive Queries
# List currently executing queries
SHOW FULL PROCESSLIST;
# Sort by execution time, show top 10
SELECT ID, USER, HOST, DB, COMMAND, TIME, STATE, LEFT(INFO,100) AS query
FROM information_schema.processlist
WHERE COMMAND!='Sleep'
ORDER BY TIME DESC
LIMIT 10;3.2 Use Performance Schema (MySQL 5.6+)
Enable relevant instruments and consumers, then query the summary:
# Enable statement monitoring
UPDATE performance_schema.setup_instruments SET ENABLED='YES' WHERE NAME LIKE 'statement/%';
# Enable event consumers
UPDATE performance_schema.setup_consumers SET ENABLED='YES' WHERE NAME LIKE 'events_statements%';
# Show top CPU‑consuming statements (MySQL 8.0+)
SELECT SCHEMA_NAME, DIGEST, COUNT_STAR,
SUM_TIMER_WAIT/1000000000000 AS total_seconds,
AVG_TIMER_WAIT/1000000000000 AS avg_seconds,
SUM_ROWS_EXAMINED, SUM_ROWS_SENT,
SUBSTR(DIGEST_TEXT,1,200) AS query
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME='your_database'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;3.3 Common CPU‑Heavy Query Patterns
Full table scans are the most frequent CPU killers:
# Find possible full scans
EXPLAIN SELECT * FROM orders WHERE status='pending';
# If type is ALL, create an index
CREATE INDEX idx_orders_status ON orders(status);Sorting operations also consume CPU; identify queries that trigger filesort:
# Find queries with filesort
SELECT COUNT(*) AS execution_count,
SUM_TIMER_WAIT/1000000000000 AS total_time,
AVG_TIMER_WAIT/1000000000000 AS avg_time,
SUBSTR(DIGEST_TEXT,1,100) AS query
FROM performance_schema.events_statements_summary_by_digest
WHERE DIGEST_TEXT LIKE '%ORDER BY%'
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;4. Lock‑Wait Issues
4.1 Lock Waits Cause CPU Spin
# Show current lock waits
SELECT r.trx_id AS waiting_trx_id, r.trx_mysql_thread_id AS waiting_thread,
r.trx_query AS waiting_query, b.trx_id AS blocking_trx_id,
b.trx_mysql_thread_id AS blocking_thread, b.trx_query AS blocking_query,
b.trx_started AS blocking_started, b.trx_rows_locked AS blocking_rows_locked
FROM information_schema.INNODB_LOCK_WAITS w
JOIN information_schema.INNODB_TRX b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.INNODB_TRX r ON r.trx_id = w.requesting_trx_id;4.2 Long‑Running Transactions
# Find transactions running long
SELECT trx_id, trx_mysql_thread_id, trx_state, trx_started,
TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS running_seconds,
trx_rows_locked, trx_query
FROM information_schema.INNODB_TRX
ORDER BY trx_started;
# Find transactions longer than 10 minutes
SELECT trx_mysql_thread_id, trx_started,
TIMESTAMPDIFF(MINUTE, trx_started, NOW()) AS running_minutes,
trx_query
FROM information_schema.INNODB_TRX
WHERE TIMESTAMPDIFF(MINUTE, trx_started, NOW()) > 10;4.3 Resolve Lock Waits
Force‑kill a blocking connection only after careful analysis:
# Show specific connections
SHOW PROCESSLIST;
# Kill the thread holding the lock
KILL <thread_id>;
# Kill all long‑running transactions (very cautious)
SELECT CONCAT('KILL ', trx_mysql_thread_id, ';') FROM information_schema.INNODB_TRX WHERE TIMESTAMPDIFF(MINUTE, trx_started, NOW()) > 30 INTO OUTFILE '/tmp/kill_long_trx.sql';
SOURCE /tmp/kill_long_trx.sql;5. Configuration Checks
5.1 Buffer Pool Mis‑configuration
# View current buffer pool settings
SHOW VARIABLES LIKE 'innodb_buffer_pool%';
# View buffer pool usage
SHOW STATUS LIKE 'Innodb_buffer_pool%';
# Example: set buffer pool to 16 GB (≈70‑80% of RAM)
SET GLOBAL innodb_buffer_pool_size = 17179869184; -- 16GB5.2 Thread and I/O Settings
# View thread cache size
SHOW VARIABLES LIKE 'thread_cache_size';
# View thread creation count
SHOW STATUS LIKE 'Threads_created';
# Set a reasonable thread cache
SET GLOBAL thread_cache_size = 64;
# View InnoDB I/O thread counts
SHOW VARIABLES LIKE 'innodb_write_io_threads';
SHOW VARIABLES LIKE 'innodb_read_io_threads';
# Increase I/O threads for SSDs
SET GLOBAL innodb_write_io_threads = 16;
SET GLOBAL innodb_read_io_threads = 16;5.3 Log Settings
# View log variables
SHOW VARIABLES LIKE 'innodb_log%';
# Recommended log file size: 256M‑1G
SHOW VARIABLES LIKE 'innodb_log_file_size';
# Flush policy (affects performance vs durability)
SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';
# 1 = safest, lowest performance
# 2 = OS‑buffered, balanced
# 0 = flush once per second, highest performance6. Business‑Layer Analysis
6.1 Normal vs. Abnormal Traffic
# Check MySQL QPS
mysqladmin ext -i 1 | grep Questions
# Check SELECT QPS
mysqladmin ext -i 1 | grep "Com_select"
# Compare with historical data to spot abnormal spikesIf traffic growth is legitimate, consider architectural changes (read/write splitting, sharding). If abnormal, investigate possible attacks or crawlers.
6.2 Analyze Request Characteristics
# Most frequently queried tables
SELECT OBJECT_SCHEMA, OBJECT_NAME, COUNT_STAR,
SUM_TIMER_WAIT/1000000000000 AS total_seconds
FROM performance_schema.objects_summary_global_by_type
WHERE OBJECT_TYPE='TABLE'
ORDER BY COUNT_STAR DESC
LIMIT 20;
# Most used indexes
SELECT OBJECT_SCHEMA, OBJECT_NAME, INDEX_NAME, COUNT_STAR,
SUM_TIMER_WAIT/1000000000000 AS total_seconds
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE INDEX_NAME IS NOT NULL
ORDER BY COUNT_STAR DESC
LIMIT 20;6.3 Traffic Source Analysis
# Show client IP distribution
mysql -e "SHOW PROCESSLIST" | awk '{print $3}' | grep -v "localhost" | sort | uniq -c | sort -rn
# Review application‑side connection‑pool settings7. Emergency Mitigations
7.1 Short‑Term Relief
# Switch to read‑only mode (if a replica exists)
SET GLOBAL read_only = ON;
SET GLOBAL super_read_only = ON;
# Kill top CPU‑consuming queries
SELECT CONCAT('KILL QUERY ', ID, ';') FROM information_schema.processlist WHERE COMMAND!='Sleep' ORDER BY TIME DESC LIMIT 5;7.2 Connection Throttling
# Limit total connections (MySQL 8.0+)
SET GLOBAL max_connections = 500;
# Per‑user limits must be enforced at the application layer or via ProxySQL/MySQL Router7.3 Query Throttling
# Temporarily raise slow‑query threshold
SET GLOBAL long_query_time = 0.5;
# Use pt‑query‑digest or Percona Server to reject specific patterns8. Root‑Cause Solutions
8.1 Read/Write Splitting
# Check replica sync status
SHOW SLAVE STATUS\G
# Deploy MySQL Router or ProxySQL for read/write routing8.2 Sharding
# Example: time‑based partitioning for orders
-- orders_202401, orders_202402, ...
# Or use middleware such as Vitess, TiDB, ProxySQL8.3 Caching Layer
# Cache hot data in Redis
# Session cache
# Config cache
# Aggregated statistics cache9. Preventive Measures
9.1 Monitoring & Alerting
# Prometheus + mysqld_exporter
# Key metrics:
# - mysql_global_status_threads_connected
# - mysql_global_status_threads_running
# - rate(mysql_global_status_questions[5m])
# - rate(mysql_global_status_slow_queries[5m])9.2 Regular Slow‑Query Review
# Daily slow‑query analysis
pt-query-digest --since '24h' /var/log/mysql/slow.log > /tmp/slow_query_report.txt9.3 Capacity Planning
# Monitor CPU & connection trends
# Plan hardware upgrades or architectural scaling ahead of demand10. Conclusion
MySQL CPU spikes require a holistic analysis. Operators should first verify that MySQL itself is the CPU consumer, then examine internal metrics—connection counts, running queries, lock waits, and configuration parameters. After pinpointing the root cause, apply short‑term mitigations and long‑term architectural improvements such as read/write splitting, sharding, or caching. Robust monitoring, slow‑query governance, and capacity planning are essential to prevent recurrence.
References:
MySQL 8.0 Reference Manual – InnoDB Performance
MySQL 8.0 Reference Manual – Performance Schema
Percona Blog – MySQL CPU Performance
SHOW ENGINE INNODB STATUS documentation
MySQL connection management and optimization guide
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
