Master MySQL Performance Tuning & Troubleshooting on Linux: A Complete Guide
This comprehensive guide walks you through why MySQL performance matters, how to benchmark and establish baselines, apply Linux system and MySQL configuration optimizations, fine‑tune SQL queries, diagnose common failures, set up robust monitoring, and implement high‑availability architectures for production environments.
MySQL Performance Tuning and Fault Diagnosis on Linux
As a DBA with eight years of production experience, I have faced countless MySQL performance crises—from million‑row query timeouts to replication lag that broke business logic. Each incident taught valuable lessons, and in this article I share the essential tuning principles and diagnostic techniques I have accumulated.
Why MySQL Performance Tuning Matters
In the internet era, database performance directly impacts user experience and revenue. A single slow query can cause:
User churn increase of 15%
Server resource consumption surge
System‑wide cascade failures
Real‑world case: An e‑commerce platform’s unoptimized SQL caused order response time to jump from 100 ms to 30 s during Double‑11, reducing conversion rate by 25%.
MySQL Performance Benchmark: Establishing Baselines
Key Metrics
# Core performance metric monitoring script
#!/bin/bash
echo "=== MySQL Performance Baseline ==="
mysql -e "SHOW GLOBAL STATUS LIKE 'Questions';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Uptime';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_connected';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_running';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads';"QPS calculation:
QPS = (Questions - previous Questions) / time intervalTesting Tools
1. sysbench – the all‑round stress test tool
# Prepare test data
sysbench /usr/share/sysbench/oltp_read_write.lua \
--mysql-host=localhost \
--mysql-user=test \
--mysql-password=password \
--mysql-db=testdb \
--tables=10 \
--table-size=1000000 \
prepare
# Run mixed read/write test
sysbench /usr/share/sysbench/oltp_read_write.lua \
--mysql-host=localhost \
--mysql-user=test \
--mysql-password=password \
--mysql-db=testdb \
--tables=10 \
--table-size=1000000 \
--threads=16 \
--time=300 \
--report-interval=10 \
run2. mysqlslap – MySQL official load generator
# Simulate 100 concurrent users, 1000 queries each
mysqlslap --user=root --password=password \
--host=localhost \
--concurrency=100 \
--iterations=1000 \
--create-schema=testdb \
--query="SELECT * FROM users WHERE id = FLOOR(RAND() * 100000);" \
--verboseLinux System‑Level Optimizations
Kernel Parameter Gold Config
# /etc/sysctl.conf optimization
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 0
# Memory management
vm.swappiness = 1
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5
# File system limits
fs.file-max = 1000000Disk I/O Optimization
1. File system choice and mount options
# ext4 best practice
mount -o noatime,data=writeback,barrier=0,nobh /dev/sdb1 /var/lib/mysql
# XFS (recommended for large data sets)
mount -o noatime,attr2,inode64,noquota /dev/sdb1 /var/lib/mysql2. I/O scheduler
# SSD – use deadline scheduler
echo deadline > /sys/block/sdb/queue/scheduler
# HDD – use cfq scheduler
echo cfq > /sys/block/sda/queue/schedulerMemory Allocation Strategy
# Calculate reasonable InnoDB buffer pool size (70‑80% of RAM)
total_mem=$(free -m | awk 'NR==2{print $2}')
buffer_pool_size=$((total_mem * 75 / 100))
echo "Suggested InnoDB buffer pool size: ${buffer_pool_size}M"MySQL Configuration Deep Tuning
my.cnf Gold Template
[mysqld]
# Basic settings
port = 3306
socket = /var/lib/mysql/mysql.sock
pid-file = /var/lib/mysql/mysql.pid
datadir = /var/lib/mysql
tmpdir = /tmp
# Connection settings
max_connections = 1000
max_connect_errors = 10000
connect_timeout = 60
wait_timeout = 28800
interactive_timeout = 28800
# InnoDB core parameters
innodb_buffer_pool_size = 6G # ~75% of memory
innodb_buffer_pool_instances = 8
innodb_log_file_size = 1G
innodb_log_files_in_group = 2
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 2 # performance‑first
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_thread_concurrency = 0
# Query cache (MySQL 5.7 and below)
query_cache_type = 1
query_cache_size = 256M
query_cache_limit = 2M
# Temporary table limits
tmp_table_size = 256M
max_heap_table_size = 256M
# Slow query log
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 1.0
log_queries_not_using_indexes = 1
# Binary logging
log-bin = mysql-bin
binlog_format = ROW
binlog_cache_size = 4M
max_binlog_cache_size = 8M
max_binlog_size = 1G
expire_logs_days = 7
# Replication tweaks
relay_log_recovery = 1
slave_skip_errors = 1062,1053,1146Dynamic Parameter Tweaks
# Adjust buffer pool at runtime
SET GLOBAL innodb_buffer_pool_size = 8*1024*1024*1024; -- 8GB
SET GLOBAL max_connections = 2000;
SET GLOBAL innodb_io_capacity = 3000;
SET GLOBAL query_cache_size = 512*1024*1024; -- 512MB
# Verify current settings
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';
SHOW VARIABLES LIKE 'max_connections';SQL Query Optimization
Slow‑Query Analysis Tools
1. mysqldumpslow
# Analyze top 10 slow queries by count
mysqldumpslow -s c -t 10 /var/log/mysql/slow.log
# By average time
mysqldumpslow -s t -t 10 /var/log/mysql/slow.log
# By lock time
mysqldumpslow -s l -t 10 /var/log/mysql/slow.log2. pt‑query‑digest (Percona Toolkit)
# Install toolkit
yum install percona-toolkit
# Generate detailed report
pt-query-digest /var/log/mysql/slow.log > slow_query_analysis.txtIndex Design Best Practices
1. Composite index order
# Wrong order
CREATE INDEX idx_wrong ON orders (status, create_time, user_id);
# Correct order (high‑selectivity columns first)
CREATE INDEX idx_correct ON orders (user_id, status, create_time);
# Verify usage
EXPLAIN SELECT * FROM orders WHERE user_id = 12345 AND status = 'paid' ORDER BY create_time DESC LIMIT 10;2. Covering indexes
# Create covering index
CREATE INDEX idx_covering ON users (email, status, create_time, username);
# Query that uses only the index
SELECT username FROM users WHERE email = '[email protected]' AND status = 'active';EXPLAIN Execution Plan Interpretation
EXPLAIN FORMAT=JSON SELECT u.username, o.total_amount
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.status = 'active' AND o.create_time > '2024-01-01'; type: access type (system > const > eq_ref > ref > range > index > ALL) key: actual index used rows: estimated rows scanned (smaller is better) Extra: additional info such as "Using filesort" or "Using temporary" that need optimization
Troubleshooting Real‑World Cases
Case 1 – Connection Exhaustion
Symptom : ERROR 1040 (HY000): Too many connections Diagnosis :
# Check current connections
SHOW PROCESSLIST;
SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Max_used_connections';
SHOW VARIABLES LIKE 'max_connections';
# Identify long‑running connections
SELECT host, db, user, info, time FROM information_schema.processlist WHERE time > 300;Solution :
# Temporarily raise limit
SET GLOBAL max_connections = 2000;
# Optimize application connection pool (e.g., pool size 20, idle timeout 300s)Case 2 – InnoDB Lock Wait Timeout
Symptom : ERROR 1205 (HY000): Lock wait timeout exceeded Diagnosis Script :
# Show lock wait details
SELECT r.trx_id waiting_trx_id, r.trx_mysql_thread_id waiting_thread, r.trx_query waiting_query,
b.trx_id blocking_trx_id, b.trx_mysql_thread_id blocking_thread, b.trx_query blocking_query
FROM information_schema.innodb_lock_waits w
JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;
# Kill blocking transaction
KILL 12345; -- replace with actual thread_idCase 3 – Master‑Slave Replication Lag
Monitoring Script :
#!/bin/bash
while true; do
delay=$(mysql -h slave_host -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print $2}')
if [ "$delay" -gt 60 ]; then
echo "WARNING: Replication delay is ${delay} seconds"
# send alert (e.g., webhook)
fi
sleep 10
doneOptimization :
# Increase parallel workers on the slave
SET GLOBAL slave_parallel_workers = 8;
SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK';
SET GLOBAL binlog_group_commit_sync_delay = 1000;Monitoring & Alerting Framework
Prometheus + Grafana Configuration
# prometheus.yml MySQL job
- job_name: 'mysql'
static_configs:
- targets: ['localhost:9104']
scrape_interval: 15s
metrics_path: /metricsKey Metrics :
Connection usage:
mysql_global_status_threads_connected / mysql_global_variables_max_connectionsBuffer pool hit rate:
(mysql_global_status_innodb_buffer_pool_read_requests - mysql_global_status_innodb_buffer_pool_reads) / mysql_global_status_innodb_buffer_pool_read_requestsSlow‑query growth:
rate(mysql_global_status_slow_queries[5m])Automated Alert Scripts
#!/bin/bash
DB_HOST="localhost"
DB_USER="monitor"
DB_PASS="password"
check_mysql_health() {
if ! mysql -h $DB_HOST -u $DB_USER -p$DB_PASS -e "SELECT 1" > /dev/null 2>&1; then
send_alert "MySQL service abnormal"
return 1
fi
connections=$(mysql -h $DB_HOST -u $DB_USER -p$DB_PASS -e "SHOW STATUS LIKE 'Threads_connected'" | awk 'NR==2{print $2}')
if [ $connections -gt 800 ]; then
send_alert "MySQL connections high: $connections"
fi
slave_status=$(mysql -h $DB_HOST -u $DB_USER -p$DB_PASS -e "SHOW SLAVE STATUS\G" | grep "Slave_IO_Running" | awk '{print $2}')
if [ "$slave_status" != "Yes" ]; then
send_alert "MySQL replication abnormal"
fi
}
send_alert() {
message="$1"
# Example: send to DingTalk or WeChat webhook
curl -X POST "webhook_url" -d "{\"text\": \"$message\"}"
}
check_mysql_healthHigh‑Availability Best Practices
MHA Automatic Failover Configuration
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/var/lib/mysql
user=mha
password=mhapassword
ssh_user=root
repl_user=repl
repl_password=replpassword
ping_interval=3
shutdown_script=/usr/local/bin/power_manager
master_ip_failover_script=/usr/local/bin/master_ip_failover
[server1]
hostname=mysql-master
port=3306
[server2]
hostname=mysql-slave1
port=3306
candidate_master=1
[server3]
hostname=mysql-slave2
port=3306ProxySQL Read/Write Splitting
# Insert MySQL servers
INSERT INTO mysql_servers(hostgroup_id, hostname, port, weight) VALUES
(0, '192.168.1.10', 3306, 1000), -- master
(1, '192.168.1.11', 3306, 900), -- slave1
(1, '192.168.1.12', 3306, 900); -- slave2
# Query routing rules
INSERT INTO mysql_query_rules(rule_id, active, match_pattern, destination_hostgroup, apply) VALUES
(1, 1, '^SELECT.*', 1, 1), -- reads go to slaves
(2, 1, '^INSERT|UPDATE|DELETE.*', 0, 1); -- writes go to master
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;Final Recommendations & Checklist
Hardware first : SSD storage, ample RAM, 10 GbE NICs.
OS layer : tune kernel parameters, choose appropriate file system, adjust I/O scheduler.
MySQL config : optimize InnoDB buffers, connection limits, cache settings.
Application layer : rewrite slow SQL, design proper indexes, optimize business logic.
Monitoring : track connection usage, buffer‑pool hit rate, slow‑query rate, set alerts.
High availability : use MHA or ProxySQL for automatic failover and read/write splitting.
Remember: monitor first, let data speak, iterate quickly, and continuously optimize.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
