Databases 24 min read

Master MySQL Performance: From Slow Queries to Billion-Row Optimization

This comprehensive guide walks you through real-world MySQL performance tuning, covering slow query analysis, index design, query optimization, parameter tuning, hardware considerations, sharding, and fault handling, with practical scripts and case studies to help you transform sluggish databases into high‑throughput, scalable systems.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master MySQL Performance: From Slow Queries to Billion-Row Optimization

MySQL Performance Optimization: From Slow Queries to Billion-Row Optimization

As a seasoned operations engineer with eight years of experience handling MySQL performance issues, I share practical, real‑world techniques that go beyond theory, focusing on diagnosing bottlenecks, optimizing indexes, tuning queries, adjusting parameters, scaling architecture, and handling common failures.

1. Diagnosing Performance Issues: Finding Bottlenecks Is More Important Than Tuning

1.1 Slow Query Log: First Evidence of Performance Problems

Many ops know the slow query log but rarely use it. Enable and analyze it to capture problematic statements.

Quickly enable slow query log:

-- 查看当前慢查询配置
SHOW VARIABLES LIKE '%slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

-- 动态开启慢查询日志(立即生效,重启失效)
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
SET GLOBAL long_query_time = 1;  -- 记录超过1秒的查询
SET GLOBAL log_queries_not_using_indexes = 'ON';  -- 记录未使用索引的查询

Slow query analysis tool – pt‑query‑digest:

# Install percona‑toolkit
wget https://downloads.percona.com/downloads/percona-toolkit/3.5.0/binary/tarball/percona-toolkit-3.5.0_x86_64.tar.gz
tar -xzvf percona-toolkit-3.5.0_x86_64.tar.gz

# Analyze slow query log, find top 10 problematic SQL
pt‑query‑digest /var/log/mysql/slow.log > analyze_result.txt

# Show only the longest 10 queries
pt‑query‑digest --limit=10 --order‑by=Query_time:sum /var/log/mysql/slow.log

Practical tip: Schedule a daily cron job to analyze the previous day's slow log and email the results, catching performance regressions early.

1.2 Real‑time Monitoring: Catch Performance Issues in the Act

When the database suddenly slows, these commands help you quickly locate the culprit.

-- 查看当前正在执行的SQL
SHOW PROCESSLIST;
-- 或者使用更详细的查询
SELECT * FROM information_schema.processlist WHERE command != 'Sleep' ORDER BY time DESC;

-- 查看 InnoDB 引擎状态(包含死锁信息)
SHOW ENGINE INNODB STATUS\G

-- 查看表锁等待情况
SELECT * FROM information_schema.innodb_lock_waits;

-- 查看事务执行情况
SELECT * FROM information_schema.innodb_trx WHERE trx_state = 'RUNNING' ORDER BY trx_started;

Case study: An order system slowed down; SHOW PROCESSLIST revealed 200+ queries waiting for a table lock caused by an ALTER TABLE executed in production. Lesson: perform DDL during low‑traffic windows and use tools like pt‑online‑schema‑change.

1.3 Performance Metrics Monitoring: Build a MySQL Health Check System

#!/bin/bash
# MySQL performance monitoring script monitor_mysql.sh
MYSQL_USER="monitor"
MYSQL_PASS="your_password"
MYSQL_HOST="localhost"

# Monitor QPS (queries per second)
QPS=$(mysql -u${MYSQL_USER} -p${MYSQL_PASS} -h${MYSQL_HOST} -e "SHOW GLOBAL STATUS LIKE 'Questions';" -ss | awk '{print $2}')
sleep 1
QPS2=$(mysql -u${MYSQL_USER} -p${MYSQL_PASS} -h${MYSQL_HOST} -e "SHOW GLOBAL STATUS LIKE 'Questions';" -ss | awk '{print $2}')
echo "Current QPS: $((QPS2-QPS))"

# Monitor connection count
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -h${MYSQL_HOST} -e "SELECT count(*) as total_connections, sum(case when command='Sleep' then 1 else 0 end) as sleeping, sum(case when command!='Sleep' then 1 else 0 end) as active FROM information_schema.processlist;"

# Monitor buffer pool hit rate
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -h${MYSQL_HOST} -e "SELECT (1 - (Innodb_buffer_pool_reads/Innodb_buffer_pool_read_requests)) * 100 as hit_ratio FROM (SELECT variable_value Innodb_buffer_pool_reads FROM information_schema.global_status WHERE variable_name='Innodb_buffer_pool_reads') a, (SELECT variable_value Innodb_buffer_pool_read_requests FROM information_schema.global_status WHERE variable_name='Innodb_buffer_pool_read_requests') b;"

2. Index Optimization: Core Techniques to Make Queries Fly

2.1 Index Design Principles: More Is Not Better

Common misconception: adding many indexes always improves performance. In reality, excessive indexes cause write slowdown, consume disk space, and confuse the optimizer.

Write performance degrades (each INSERT/UPDATE must maintain indexes).

More disk space consumption.

Optimizer may choose a sub‑optimal index.

Golden rule for index design:

-- Example: e‑commerce orders table
CREATE TABLE orders (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id BIGINT NOT NULL,
    order_no VARCHAR(32) NOT NULL,
    status TINYINT NOT NULL DEFAULT 0,
    total_amount DECIMAL(10,2) NOT NULL,
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL,
    UNIQUE KEY uk_order_no (order_no),               -- order number unique index
    KEY idx_user_status (user_id, status, created_at), -- composite index for frequent queries
    KEY idx_created_at (created_at)                 -- time range queries
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

2.2 Index Pitfalls: Why an Index Might Not Be Used

-- Create test table
CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT,
    email VARCHAR(100),
    KEY idx_name (name),
    KEY idx_age (age)
);

-- Pitfall 1: Type mismatch (age is INT, query uses string)
EXPLAIN SELECT * FROM users WHERE age = '25';  -- may not use index
EXPLAIN SELECT * FROM users WHERE age = 25;   -- uses index

-- Pitfall 2: Using functions on indexed columns
EXPLAIN SELECT * FROM users WHERE YEAR(created_at) = 2024;  -- no index
EXPLAIN SELECT * FROM users WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';  -- uses index

-- Pitfall 3: Leftmost prefix rule in composite indexes
-- Assume composite index idx_abc(a,b,c)
-- Uses index: WHERE a=1 AND b=2
-- Does NOT use index: WHERE b=2 AND c=3
-- Partial use: WHERE a=1 AND c=3 (only a is used)

2.3 Index Optimization in Practice: Real Case

Last quarter I reduced a query from 30 seconds to 0.1 seconds. The optimization steps are shown below.

-- Original slow query (30 s)
SELECT o.order_no, o.total_amount, u.name, p.product_name
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.created_at > DATE_SUB(NOW(), INTERVAL 30 DAY)
  AND o.status = 1
  AND u.city = '北京';

-- EXPLAIN revealed full table scan on orders (no suitable index)

-- Solution 1: Add appropriate indexes
ALTER TABLE orders ADD INDEX idx_status_created (status, created_at);
ALTER TABLE users ADD INDEX idx_city (city);

-- Solution 2: Rewrite SQL to limit result set first
SELECT o.order_no, o.total_amount, u.name, p.product_name
FROM (
    SELECT * FROM orders
    WHERE status = 1 AND created_at > DATE_SUB(NOW(), INTERVAL 30 DAY)
    LIMIT 1000
) o
JOIN users u ON o.user_id = u.id AND u.city = '北京'
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id;

-- Execution time after changes: 0.1 s

3. Query Optimization: The Art of Writing High‑Performance SQL

3.1 JOIN Optimization: Small Table Drives Large Table

-- Inefficient (large drives small)
SELECT o.*, u.name
FROM orders o
LEFT JOIN users u ON o.user_id = u.id
WHERE u.city = '北京';

-- Efficient (small drives large)
SELECT o.*, u.name
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.city = '北京';

-- Better: filter small table first with subquery
SELECT o.*, u.name
FROM orders o
INNER JOIN (
    SELECT id, name FROM users WHERE city = '北京'
) u ON o.user_id = u.id;

3.2 Pagination Optimization: Solving Large Offset Issues

-- Problem: deep pagination scans many rows
SELECT * FROM orders ORDER BY id LIMIT 1000000, 20;  -- scans over 1,000,020 rows

-- Solution 1: Use covering index with a sub‑select
SELECT * FROM orders o
INNER JOIN (SELECT id FROM orders ORDER BY id LIMIT 1000000, 20) t ON o.id = t.id;

-- Solution 2: Use cursor (remember last id)
SELECT * FROM orders WHERE id > 1000000 ORDER BY id LIMIT 20;

-- Solution 3: Delayed join (use index on id first)
SELECT * FROM orders o
INNER JOIN (
    SELECT id FROM orders WHERE created_at > '2024-01-01' ORDER BY id LIMIT 1000000, 20
) t USING(id);

3.3 Subquery Optimization: EXISTS vs IN vs JOIN

-- Find users who have orders
SELECT * FROM users WHERE id IN (SELECT DISTINCT user_id FROM orders);
SELECT * FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);
SELECT DISTINCT u.* FROM users u INNER JOIN orders o ON u.id = o.user_id;

-- Performance test script
SET @start = NOW(6);
SELECT COUNT(*) FROM users WHERE id IN (SELECT user_id FROM orders);
SELECT TIMESTAMPDIFF(MICROSECOND, @start, NOW(6))/1000000 AS execution_time;

4. Parameter Tuning: Squeezing Every Bit of Hardware Performance

4.1 Memory Parameter Optimization

-- Check current buffer pool size
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

-- Check buffer pool hit ratio (should be >95%)
SELECT (1 - (Innodb_buffer_pool_reads/Innodb_buffer_pool_read_requests)) * 100 AS buffer_pool_hit_ratio
FROM (
    SELECT variable_value AS Innodb_buffer_pool_reads FROM information_schema.global_status WHERE variable_name='Innodb_buffer_pool_reads'
) a,
(
    SELECT variable_value AS Innodb_buffer_pool_read_requests FROM information_schema.global_status WHERE variable_name='Innodb_buffer_pool_read_requests'
) b;

-- Example my.cnf configuration
[mysqld]
innodb_buffer_pool_size = 48G          # 75% of a 64 GB server
innodb_buffer_pool_instances = 8
innodb_log_file_size = 2G
innodb_flush_log_at_trx_commit = 2   # balance performance and safety
innodb_flush_method = O_DIRECT
max_connections = 2000
max_connect_errors = 100000
connect_timeout = 10
query_cache_type = 0                # disabled, use Redis instead
tmp_table_size = 256M
max_heap_table_size = 256M
slow_query_log = 1
long_query_time = 1
log_queries_not_using_indexes = 1

4.2 Hardware Optimization Recommendations

SSD > Memory > CPU : SSDs give the biggest performance boost for databases.

RAID configuration : RAID10 offers the best balance of performance and reliability.

Network : 10 GbE reduces latency for distributed setups.

5. Architecture Optimization: From Single Machine to Distributed

5.1 Read‑Write Splitting: Simple Effective Scaling

# Python read‑write splitting example
import random, pymysql

class DBRouter:
    def __init__(self):
        self.master = pymysql.connect(host='master.db.com', user='root', password='password', database='mydb')
        self.slaves = [
            pymysql.connect(host='slave1.db.com', user='root', password='password', database='mydb'),
            pymysql.connect(host='slave2.db.com', user='root', password='password', database='mydb')
        ]

    def execute_write(self, sql, params=None):
        """Writes go to the master"""
        with self.master.cursor() as cursor:
            cursor.execute(sql, params)
        self.master.commit()
        return cursor.lastrowid

    def execute_read(self, sql, params=None):
        """Reads randomly pick a slave"""
        slave = random.choice(self.slaves)
        with slave.cursor() as cursor:
            cursor.execute(sql, params)
        return cursor.fetchall()

    def execute_read_master(self, sql, params=None):
        """Force read from master to avoid replication lag"""
        with self.master.cursor() as cursor:
            cursor.execute(sql, params)
        return cursor.fetchall()

# Usage example
router = DBRouter()
order_id = router.execute_write("INSERT INTO orders (user_id, amount) VALUES (%s, %s)", (123, 99.99))
order = router.execute_read_master("SELECT * FROM orders WHERE id = %s", (order_id,))

5.2 Sharding: Handling Billions of Rows

-- Create 16 sharded tables based on a template
CREATE TABLE orders_0 LIKE orders_template;
CREATE TABLE orders_1 LIKE orders_template;
-- ... up to orders_15

-- Routing algorithm (application layer)
-- table_index = user_id % 16
-- Example: user_id = 12345 -> data stored in orders_9
# Python sharding router example
class ShardingRouter:
    def __init__(self, shard_count=16):
        self.shard_count = shard_count

    def get_table_name(self, base_name, sharding_key):
        """Calculate table name based on sharding key"""
        shard_index = sharding_key % self.shard_count
        return f"{base_name}_{shard_index}"

    def insert_order(self, user_id, order_data):
        table = self.get_table_name('orders', user_id)
        sql = f"INSERT INTO {table} (user_id, ...) VALUES (%s, ...)"
        # execute sql with order_data

    def query_user_orders(self, user_id):
        table = self.get_table_name('orders', user_id)
        sql = f"SELECT * FROM {table} WHERE user_id = %s"
        # execute sql

    def query_order_by_id(self, order_id):
        """Scan all shards to find an order by its ID"""
        results = []
        for i in range(self.shard_count):
            table = f"orders_{i}"
            sql = f"SELECT * FROM {table} WHERE order_id = %s"
            results.extend(execute_query(sql, order_id))
        return results

6. Fault Handling: Common Pitfalls

6.1 Deadlock Issues

-- Show recent deadlock information
SHOW ENGINE INNODB STATUS\G

-- Find current lock waits
SELECT r.trx_id AS waiting_trx_id,
       r.trx_mysql_thread_id AS waiting_thread,
       r.trx_query AS waiting_query,
       b.trx_id AS blocking_trx_id,
       b.trx_mysql_thread_id AS blocking_thread,
       b.trx_query AS blocking_query
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;

-- Kill the blocking transaction (replace 12345 with actual thread id)
KILL 12345;

Best practices to prevent deadlocks:

Keep transactions short.

Access tables and rows in a consistent order.

Use lower isolation levels such as READ COMMITTED.

Add appropriate indexes to avoid table‑level locks.

6.2 Master‑Slave Lag

#!/bin/bash
# Monitor replication lag for given slave host
check_slave_lag() {
    lag=$(mysql -h $1 -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print $2}')
    if [ "$lag" = "NULL" ]; then
        echo "Slave is not running on $1"
    elif [ "$lag" -gt 10 ]; then
        echo "Warning: Slave lag on $1 is ${lag} seconds"
    else
        echo "Slave $1 is healthy, lag: ${lag}s"
    fi
}

for slave in slave1.db.com slave2.db.com; do
    check_slave_lag $slave
done

6.3 Connection Pool Exhaustion

-- Current connections
SHOW STATUS LIKE 'Threads_connected';

-- Max connections setting
SHOW VARIABLES LIKE 'max_connections';

-- Distribution of connections per user/host
SELECT user, host, COUNT(*) AS connections, GROUP_CONCAT(DISTINCT db) AS databases
FROM information_schema.processlist
GROUP BY user, host
ORDER BY connections DESC;

-- Find long‑running Sleep connections (idle > 300 s)
SELECT * FROM information_schema.processlist
WHERE command = 'Sleep' AND time > 300
ORDER BY time DESC;

Practical Summary: Optimization Is a Systematic Engineering Effort

Monitoring First : Build a complete monitoring system before any tuning.

Targeted Fixes : Identify the real bottleneck and address it specifically.

Iterative Changes : Change one parameter at a time, observe the effect, then proceed.

Backup Always : Back up data before performing any major optimization.

Continuous Learning : MySQL evolves rapidly; stay updated with new features and best practices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

query optimizationperformance tuningmysqlIndex Optimizationdatabase scaling
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.