Databases 17 min read

How to Diagnose and Resolve MySQL CPU Spikes: A Complete Step‑by‑Step Guide

This guide walks you through identifying why MySQL CPU usage jumps, from confirming the MySQL process consumes CPU to checking connection counts, slow queries, lock waits, configuration settings, and business‑level traffic, and then provides short‑term mitigations and long‑term solutions such as read‑write splitting, sharding, and caching.

Ops Community
Ops Community
Ops Community
How to Diagnose and Resolve MySQL CPU Spikes: A Complete Step‑by‑Step Guide

1. Identify and Confirm the Issue

1.1 Verify CPU usage at the system level

Check that the MySQL process is actually consuming CPU and rule out other processes:

# View MySQL process CPU usage
 top -bn1 | grep mysql
 ps aux | grep mysql

# View CPU usage trend for MySQL
 sar -u 1 10

# View per‑CPU core usage
 mpstat -P ALL 1 5

# View system load
 uptime

If MySQL’s CPU usage is low, the problem may lie elsewhere (e.g., web or app servers). If it is high, continue the investigation.

1.2 Quick check of MySQL internal status

Log into MySQL and run the following statements to get key metrics:

-- Current connections
 SHOW STATUS LIKE 'Threads_connected';
 SHOW STATUS LIKE 'Max_used_connections';

-- Running queries count
 SHOW STATUS LIKE 'Threads_running';

-- Query cache hit rate (MySQL 5.7 and earlier)
 SHOW STATUS LIKE 'Qcache%';

-- InnoDB buffer pool status
 SHOW STATUS LIKE 'Innodb_buffer_pool%';

-- Lock wait information
 SHOW ENGINE INNODB STATUS\G;

-- Transactions and locks
 SELECT * FROM information_schema.INNODB_TRX;
 SELECT * FROM information_schema.INNODB_LOCK_WAITS;

The Threads_running metric is crucial; values consistently higher than 2‑3 × the number of CPU cores indicate the server is overloaded.

2. Investigate Connection‑Related Issues

2.1 Excessive connections causing high CPU

High concurrency creates many connections, which itself consumes CPU:

-- List all connections
 SHOW PROCESSLIST;
 SHOW FULL PROCESSLIST;

-- Connections per user
 SELECT user, COUNT(*) FROM information_schema.processlist GROUP BY user;

-- Connections per host
 SELECT host, COUNT(*) FROM information_schema.processlist GROUP BY host;

-- Max connections setting
 SHOW VARIABLES LIKE 'max_connections';

-- Connection memory usage
 SHOW STATUS LIKE 'Connection_errors%';

Temporary remediation:

-- Raise max connections temporarily
 SET GLOBAL max_connections = 2000;

-- Kill idle connections (sleep > 1 hour)
 SELECT CONCAT('KILL ', id, ';') FROM information_schema.processlist WHERE Command='Sleep' AND Time>3600 INTO OUTFILE '/tmp/kill_connections.sql';
 SOURCE /tmp/kill_connections.sql;

-- Persist the change in my.cnf
 [mysqld]
 max_connections = 2000
 wait_timeout = 600
 interactive_timeout = 600

2.2 Insufficient connection pooling

If the application does not reuse connections, each request creates a new one, adding CPU load:

# Check connection creation rate
 SHOW STATUS LIKE 'Connections';

# Check aborted connections
 SHOW STATUS LIKE 'Aborted%';

A high Aborted_connects value suggests many failed connection attempts, possibly due to authentication or permission problems.

3. Slow Queries Driving CPU Usage

3.1 Find the most CPU‑intensive queries

# List currently running queries
 SHOW FULL PROCESSLIST;

# Sort by execution time to locate the longest‑running queries
 SELECT ID, USER, HOST, DB, COMMAND, TIME, STATE, LEFT(INFO,100) AS Query
 FROM information_schema.processlist
 WHERE COMMAND!='Sleep'
 ORDER BY TIME DESC
 LIMIT 10;

3.2 Use Performance Schema for deeper analysis (MySQL 5.6+)

# Enable instrumentation
 UPDATE performance_schema.setup_instruments SET ENABLED='YES' WHERE NAME LIKE 'statement/%';
 UPDATE performance_schema.setup_consumers SET ENABLED='YES' WHERE NAME LIKE 'events_statements%';

# Show top CPU‑consuming statements (MySQL 8.0+)
 SELECT SCHEMA_NAME, DIGEST, COUNT_STAR,
        SUM_TIMER_WAIT/1000000000000 AS total_seconds,
        AVG_TIMER_WAIT/1000000000000 AS avg_seconds,
        SUBSTR(DIGEST_TEXT,1,200) AS query
 FROM performance_schema.events_statements_summary_by_digest
 WHERE SCHEMA_NAME='your_database'
 ORDER BY SUM_TIMER_WAIT DESC
 LIMIT 10;

3.3 Typical CPU‑heavy query patterns

Full table scans are common culprits:

# Identify possible full scans
 EXPLAIN SELECT * FROM orders WHERE status='pending';

# If the type column shows ALL, create an appropriate index
 CREATE INDEX idx_orders_status ON orders(status);

Sorting operations also consume CPU; check for filesort usage:

# Find queries that trigger filesort
 SELECT COUNT(*) AS execution_count,
        SUM_TIMER_WAIT/1000000000000 AS total_time,
        AVG_TIMER_WAIT/1000000000000 AS avg_time,
        SUBSTR(DIGEST_TEXT,1,100) AS query
 FROM performance_schema.events_statements_summary_by_digest
 WHERE DIGEST_TEXT LIKE '%ORDER BY%'
 ORDER BY SUM_TIMER_WAIT DESC
 LIMIT 10;

4. Lock‑Wait Issues

4.1 Locks causing CPU spin

# Show current lock waits
 SELECT r.trx_id AS waiting_trx_id, r.trx_mysql_thread_id AS waiting_thread,
        r.trx_query AS waiting_query, b.trx_id AS blocking_trx_id,
        b.trx_mysql_thread_id AS blocking_thread, b.trx_query AS blocking_query,
        b.trx_started AS blocking_started, b.trx_rows_locked AS blocking_rows_locked
 FROM information_schema.INNODB_LOCK_WAITS w
 JOIN information_schema.INNODB_TRX b ON b.trx_id = w.blocking_trx_id
 JOIN information_schema.INNODB_TRX r ON r.trx_id = w.requesting_trx_id;

4.2 Long‑running transactions

# Transactions running for a long time
 SELECT trx_id, trx_mysql_thread_id, trx_state, trx_started,
        TIMESTAMPDIFF(SECOND, trx_started, NOW()) AS running_seconds,
        trx_rows_locked, trx_query
 FROM information_schema.INNODB_TRX
 ORDER BY trx_started;

# Transactions running > 10 minutes
 SELECT trx_mysql_thread_id, trx_started,
        TIMESTAMPDIFF(MINUTE, trx_started, NOW()) AS running_minutes,
        trx_query
 FROM information_schema.INNODB_TRX
 WHERE TIMESTAMPDIFF(MINUTE, trx_started, NOW()) > 10;

4.3 Force‑kill lock‑waiting transactions (use with caution)

# List connections to identify the culprit
 SHOW PROCESSLIST;

# Kill a specific thread (example thread ID 12345)
 KILL 12345;

# Kill all long‑running transactions (>30 min)
 SELECT CONCAT('KILL ', trx_mysql_thread_id, ';')
 FROM information_schema.INNODB_TRX
 WHERE TIMESTAMPDIFF(MINUTE, trx_started, NOW()) > 30
 INTO OUTFILE '/tmp/kill_long_trx.sql';
 SOURCE /tmp/kill_long_trx.sql;

5. Configuration Checks

5.1 Buffer pool sizing

# Current InnoDB buffer pool settings
 SHOW VARIABLES LIKE 'innodb_buffer_pool%';
 SHOW STATUS LIKE 'Innodb_buffer_pool%';

# Example: set buffer pool to 16 GB (≈70‑80 % of RAM)
 SET GLOBAL innodb_buffer_pool_size = 17179869184;

5.2 Thread‑related settings

# Thread cache size
 SHOW VARIABLES LIKE 'thread_cache_size';

# Threads created count
 SHOW STATUS LIKE 'Threads_created';

# Adjust thread cache
 SET GLOBAL thread_cache_size = 64;

# I/O thread counts (SSD environments may benefit from higher values)
 SHOW VARIABLES LIKE 'innodb_write_io_threads';
 SHOW VARIABLES LIKE 'innodb_read_io_threads';
 SET GLOBAL innodb_write_io_threads = 16;
 SET GLOBAL innodb_read_io_threads = 16;

5.3 Log configuration

# Log settings
 SHOW VARIABLES LIKE 'innodb_log%';

# Recommended log file size: 256 M‑1 G
 SHOW VARIABLES LIKE 'innodb_log_file_size';

# Flush policy (affects durability vs. performance)
 SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';
 -- 1 = safest (flush on every commit)
 -- 2 = OS cache flush (balanced)
 -- 0 = flush once per second (fastest, possible 1 s data loss)

6. Business‑Level Analysis

6.1 Determine whether traffic is normal or abnormal

# MySQL QPS
 mysqladmin ext -i 1 | grep Questions;

# SELECT queries per second
 mysqladmin ext -i 1 | grep "Com_select";

If the spike aligns with legitimate growth, consider architectural changes such as read‑write splitting or sharding. If it appears abnormal, investigate possible attacks or crawlers.

6.2 Analyze request characteristics

# Most frequently accessed tables
 SELECT OBJECT_SCHEMA, OBJECT_NAME, COUNT_STAR,
        SUM_TIMER_WAIT/1000000000000 AS total_seconds
 FROM performance_schema.objects_summary_global_by_type
 WHERE OBJECT_TYPE='TABLE'
 ORDER BY COUNT_STAR DESC
 LIMIT 20;

# Most used indexes
 SELECT OBJECT_SCHEMA, OBJECT_NAME, INDEX_NAME, COUNT_STAR,
        SUM_TIMER_WAIT/1000000000000 AS total_seconds
 FROM performance_schema.table_io_waits_summary_by_index_usage
 WHERE INDEX_NAME IS NOT NULL
 ORDER BY COUNT_STAR DESC
 LIMIT 20;

6.3 Source IP analysis

# Show connection source IPs
 mysql -e "SHOW PROCESSLIST" | awk '{print $3}' | grep -v "localhost" | sort | uniq -c | sort -rn;

7. Emergency Mitigation

7.1 Short‑term relief

# Disable non‑essential features temporarily
 SET GLOBAL read_only = ON;
 SET GLOBAL super_read_only = ON;

# Kill the top CPU‑consuming queries (limit 5)
 SELECT CONCAT('KILL QUERY ', ID, ';')
 FROM information_schema.processlist
 WHERE COMMAND!='Sleep'
 ORDER BY TIME DESC
 LIMIT 5;

7.2 Connection throttling

# Reduce max connections (MySQL 8.0+)
 SET GLOBAL max_connections = 500;

# Per‑user limits must be enforced at the application or via a proxy (e.g., MySQL Router, ProxySQL)

7.3 Query throttling

# Lower slow‑query threshold to prevent new heavy queries
 SET GLOBAL long_query_time = 0.5;

# With Percona Toolkit you can create rules to reject specific patterns

8. Root‑Cause Solutions

8.1 Read‑Write Splitting

Offload read traffic to replicas to reduce primary CPU load:

# Check replica sync status
 SHOW SLAVE STATUS\G;

# Use MySQL Router or ProxySQL for automatic routing

8.2 Sharding / Partitioning

For very large tables, split them horizontally (e.g., by month) or use middleware such as Vitess or TiDB.

8.3 Caching Layer

Introduce an external cache (Redis, Memcached) for hot data, session information, and aggregated statistics to cut down query volume.

9. Preventive Measures

9.1 Monitoring & Alerting

# Example: Prometheus + mysqld_exporter
# Key metrics to watch:
# - mysql_global_status_threads_connected
# - mysql_global_status_threads_running
# - rate(mysql_global_status_questions[5m])
# - rate(mysql_global_status_slow_queries[5m])

9.2 Regular Slow‑Query Review

# Daily slow‑query digest
 pt-query-digest --since '24h' /var/log/mysql/slow.log > /tmp/slow_query_report.txt

9.3 Capacity Planning

Track CPU and connection trends, then proactively scale hardware or upgrade architecture before limits are hit.

10. Conclusion

MySQL CPU spikes require a systematic approach: confirm the MySQL process is the culprit, examine internal metrics (connections, running queries, lock waits), analyze query execution plans, and verify configuration parameters. Business‑level traffic analysis helps decide whether scaling or optimization is needed. Implementing robust monitoring, slow‑query governance, and architectural safeguards such as read‑write splitting, sharding, and caching prevents future spikes.

MySQL 8.0 Reference Manual – InnoDB Performance

MySQL 8.0 Reference Manual – Performance Schema

Percona Blog – MySQL CPU Performance

SHOW ENGINE INNODB STATUS documentation

MySQL connection management and optimization guide

MonitoringPerformanceOptimizationdatabaseMySQLTroubleshootingCPU
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.