Databases 24 min read

Master MySQL Performance: From Slow Queries to Billion‑Row Scaling

This guide walks you through diagnosing MySQL bottlenecks, enabling slow‑query logging, using pt‑query‑digest, optimizing indexes, tuning parameters, handling pagination, sharding, and troubleshooting deadlocks, providing concrete commands, scripts, and real‑world examples to boost query speed from seconds to fractions of a second on massive datasets.

Raymond Ops
Raymond Ops
Raymond Ops
Master MySQL Performance: From Slow Queries to Billion‑Row Scaling

Introduction

A seasoned operations engineer shares practical MySQL performance‑tuning techniques gathered over eight years, focusing on real‑world scenarios rather than theory.

1. Performance Issue Diagnosis

1.1 Slow Query Log

Enable and configure the slow‑query log to capture queries exceeding a threshold.

-- View current slow‑query settings
SHOW VARIABLES LIKE '%slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

-- Enable slow query logging (runtime only)
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
SET GLOBAL long_query_time = 1;  -- log queries >1 s
SET GLOBAL log_queries_not_using_indexes = 'ON';

Use pt‑query‑digest to analyze the log and extract the top‑10 slow statements.

# Install Percona Toolkit
wget https://downloads.percona.com/downloads/percona-toolkit/3.5.0/binary/tarball/percona-toolkit-3.5.0_x86_64.tar.gz
tar -xzvf percona-toolkit-3.5.0_x86_64.tar.gz

# Analyze slow log
pt-query-digest /var/log/mysql/slow.log > analyze_result.txt
pt-query-digest --limit=10 --order-by=Query_time:sum /var/log/mysql/slow.log

A daily cron job can automatically analyze the previous day's log and email the results.

1.2 Real‑time Monitoring

Key commands for instant diagnosis when the database suddenly slows down:

-- Active queries
SHOW PROCESSLIST;
SELECT * FROM information_schema.processlist WHERE command!='Sleep' ORDER BY time DESC;

-- InnoDB engine status (includes deadlock info)
SHOW ENGINE INNODB STATUS\G

-- Table lock wait information
SELECT * FROM information_schema.innodb_lock_waits;

-- Running transactions
SELECT * FROM information_schema.innodb_trx WHERE trx_state='RUNNING' ORDER BY trx_started;

Example: a sudden spike in SHOW PROCESSLIST revealed 200+ queries waiting on a table lock caused by an ALTER TABLE executed in production.

2. Index Optimization

2.1 Index Design Principles

Too many indexes degrade write performance.

Indexes consume additional disk space.

Excessive indexes can confuse the optimizer.

Golden rule: design indexes that match the most common query patterns.

-- Example: e‑commerce orders table
CREATE TABLE orders (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id BIGINT NOT NULL,
    order_no VARCHAR(32) NOT NULL,
    status TINYINT NOT NULL DEFAULT 0,
    total_amount DECIMAL(10,2) NOT NULL,
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL,
    UNIQUE KEY uk_order_no (order_no),               -- unique order number
    KEY idx_user_status (user_id, status, created_at), -- composite index for user/status queries
    KEY idx_created_at (created_at)                 -- range queries on creation time
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

2.2 Index Misuse Pitfalls

-- Table with mismatched types
CREATE TABLE users (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    age INT,
    email VARCHAR(100),
    KEY idx_name (name),
    KEY idx_age (age)
);

-- Wrong: age compared as string (may skip index)
EXPLAIN SELECT * FROM users WHERE age = '25';

-- Correct: use numeric literal
EXPLAIN SELECT * FROM users WHERE age = 25;

-- Wrong: function on indexed column disables index
EXPLAIN SELECT * FROM users WHERE YEAR(created_at) = 2024;

-- Correct: range condition without function
EXPLAIN SELECT * FROM users WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- Wrong: leftmost prefix violation in composite index (idx_abc(a,b,c))
EXPLAIN SELECT * FROM t WHERE b=2 AND c=3;  -- index not used

-- Correct: use leftmost prefix
EXPLAIN SELECT * FROM t WHERE a=1 AND b=2;  -- index used

2.3 Real‑world Index Optimization Case

Original query took 30 seconds. After analyzing with EXPLAIN, a full table scan on orders was identified. Adding a composite index and rewriting the query reduced execution time to 0.1 seconds.

-- Original slow query (30 s)
SELECT o.order_no, o.total_amount, u.name, p.product_name
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.created_at > DATE_SUB(NOW(), INTERVAL 30 DAY)
  AND o.status = 1
  AND u.city = '北京';

-- Add indexes
ALTER TABLE orders ADD INDEX idx_status_created (status, created_at);
ALTER TABLE users ADD INDEX idx_city (city);

-- Rewrite to limit result set early
SELECT o.order_no, o.total_amount, u.name, p.product_name
FROM (
    SELECT * FROM orders WHERE status = 1 AND created_at > DATE_SUB(NOW(), INTERVAL 30 DAY) LIMIT 1000
) o
JOIN users u ON o.user_id = u.id AND u.city = '北京'
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id;
-- Execution time: 0.1 s

3. Query Optimization

3.1 JOIN Optimization

-- Inefficient: large table drives small table
SELECT o.*, u.name FROM orders o LEFT JOIN users u ON o.user_id = u.id WHERE u.city = '北京';

-- Efficient: small table drives large table
SELECT o.*, u.name FROM users u INNER JOIN orders o ON u.id = o.user_id WHERE u.city = '北京';

-- Better: filter users first using a subquery
SELECT o.*, u.name FROM orders o INNER JOIN (
    SELECT id, name FROM users WHERE city = '北京'
) u ON o.user_id = u.id;

3.2 Pagination for Large Offsets

-- Problem: deep offset forces MySQL to scan many rows
SELECT * FROM orders ORDER BY id LIMIT 1000000, 20;  -- scans 1,000,020 rows

-- Solution 1: use covering index
SELECT * FROM orders o INNER JOIN (
    SELECT id FROM orders ORDER BY id LIMIT 1000000, 20
) t ON o.id = t.id;

-- Solution 2: keyset pagination (remember last id)
SELECT * FROM orders WHERE id > 1000000 ORDER BY id LIMIT 20;

-- Solution 3: delayed join
SELECT * FROM orders o INNER JOIN (
    SELECT id FROM orders WHERE created_at > '2024-01-01' ORDER BY id LIMIT 1000000, 20
) t USING(id);

3.3 Subquery Strategies (EXISTS vs IN vs JOIN)

-- Find users with orders (small outer, large inner)
SELECT * FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id);

-- When inner result set is small, IN is fine
SELECT * FROM users WHERE id IN (SELECT DISTINCT user_id FROM orders);

-- JOIN is usually the fastest
SELECT DISTINCT u.* FROM users u INNER JOIN orders o ON u.id = o.user_id;

-- Simple performance test
SET @start = NOW(6);
SELECT COUNT(*) FROM users WHERE id IN (SELECT user_id FROM orders);
SELECT TIMESTAMPDIFF(MICROSECOND, @start, NOW(6))/1000000 AS execution_time;

4. Parameter Tuning

4.1 Memory Settings

-- Check current buffer pool size
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

-- Calculate buffer‑pool hit ratio (should be >95%)
SELECT (1 - (Innodb_buffer_pool_reads/Innodb_buffer_pool_read_requests)) * 100 AS buffer_pool_hit_ratio
FROM (
    SELECT variable_value AS Innodb_buffer_pool_reads FROM information_schema.global_status WHERE variable_name='Innodb_buffer_pool_reads'
) a,
(
    SELECT variable_value AS Innodb_buffer_pool_read_requests FROM information_schema.global_status WHERE variable_name='Innodb_buffer_pool_read_requests'
) b;

Typical my.cnf tweaks for a 64 GB server:

[mysqld]
# Memory (≈75% of RAM)
innodb_buffer_pool_size = 48G
innodb_buffer_pool_instances = 8
innodb_log_file_size = 2G
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT

# Connection limits
max_connections = 2000
max_connect_errors = 100000
connect_timeout = 10

# Disable query cache (MySQL 8.0 removed it)
query_cache_type = 0

# Temp table sizes
tmp_table_size = 256M
max_heap_table_size = 256M

# Slow‑query logging
slow_query_log = 1
long_query_time = 1
log_queries_not_using_indexes = 1

4.2 Hardware Recommendations (ordered list)

SSD > Memory > CPU : SSD brings the biggest performance gain.

RAID10 : best balance of speed and reliability.

10 GbE NIC : reduces network latency for distributed setups.

5. Architecture Optimization

5.1 Read‑Write Splitting (Python example)

import random
import pymysql

class DBRouter:
    def __init__(self):
        # Master (write)
        self.master = pymysql.connect(host='master.db.com', user='root', password='password', database='mydb')
        # Slave pool (read)
        self.slaves = [
            pymysql.connect(host='slave1.db.com', ...),
            pymysql.connect(host='slave2.db.com', ...),
        ]
    def execute_write(self, sql, params=None):
        """Write goes to master"""
        with self.master.cursor() as cursor:
            cursor.execute(sql, params)
        self.master.commit()
        return cursor.lastrowid
    def execute_read(self, sql, params=None):
        """Read randomly picks a slave"""
        slave = random.choice(self.slaves)
        with slave.cursor() as cursor:
            cursor.execute(sql, params)
        return cursor.fetchall()
    def execute_read_master(self, sql, params=None):
        """Force read from master (avoid replication lag)"""
        with self.master.cursor() as cursor:
            cursor.execute(sql, params)
        return cursor.fetchall()

# Usage
db = DBRouter()
order_id = db.execute_write("INSERT INTO orders (user_id, amount) VALUES (%s, %s)", (123, 99.99))
order = db.execute_read_master("SELECT * FROM orders WHERE id = %s", (order_id,))

5.2 Sharding (SQL + Python)

-- Create 16 sharded tables based on a template
CREATE TABLE orders_0 LIKE orders_template;
CREATE TABLE orders_1 LIKE orders_template;
-- ... up to orders_15

# Python routing helper
class ShardingRouter:
    def __init__(self, shard_count=16):
        self.shard_count = shard_count
    def get_table_name(self, base_name, sharding_key):
        """Calculate table name from sharding key"""
        shard_index = sharding_key % self.shard_count
        return f"{base_name}_{shard_index}"
    def insert_order(self, user_id, order_data):
        table_name = self.get_table_name('orders', user_id)
        sql = f"INSERT INTO {table_name} (user_id, ...) VALUES (%s, ...)"
        # execute sql ...
    def query_user_orders(self, user_id):
        table_name = self.get_table_name('orders', user_id)
        sql = f"SELECT * FROM {table_name} WHERE user_id = %s"
        # execute sql ...
    def query_order_by_id(self, order_id):
        """Scan all shards for a specific order id"""
        results = []
        for i in range(self.shard_count):
            table_name = f"orders_{i}"
            sql = f"SELECT * FROM {table_name} WHERE order_id = %s"
            # execute sql and extend results
        return results

6. Fault Handling

6.1 Deadlock Resolution

-- Show recent deadlock information
SHOW ENGINE INNODB STATUS\G

-- Find waiting and blocking transactions
SELECT r.trx_id waiting_trx_id,
       r.trx_mysql_thread_id waiting_thread,
       r.trx_query waiting_query,
       b.trx_id blocking_trx_id,
       b.trx_mysql_thread_id blocking_thread,
       b.trx_query blocking_query
FROM information_schema.innodb_lock_waits w
JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id
JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id;

-- Kill the blocking transaction
KILL 12345;  -- thread_id

Best practices to prevent deadlocks:

Keep transactions short.

Access tables and rows in a consistent order.

Use a lower isolation level such as READ‑COMMITTED.

Provide appropriate indexes to avoid full‑table locks.

6.2 Master‑Slave Lag Monitoring (Bash)

#!/bin/bash
# Monitor replication lag for a given slave host
check_slave_lag() {
    lag=$(mysql -h $1 -e "SHOW SLAVE STATUS\G" | grep "Seconds_Behind_Master" | awk '{print $2}')
    if [ "$lag" = "NULL" ]; then
        echo "Slave is not running on $1"
    elif [ "$lag" -gt 10 ]; then
        echo "Warning: Slave lag on $1 is ${lag} seconds"
    else
        echo "Slave $1 is healthy, lag: ${lag}s"
    fi
}

for slave in slave1.db.com slave2.db.com; do
    check_slave_lag $slave
done

6.3 Connection‑Pool Exhaustion

-- Current connections
SHOW STATUS LIKE 'Threads_connected';

-- Max connections setting
SHOW VARIABLES LIKE 'max_connections';

-- Connection distribution by user/host
SELECT user, host, COUNT(*) AS connections,
       GROUP_CONCAT(DISTINCT db) AS databases
FROM information_schema.processlist
GROUP BY user, host
ORDER BY connections DESC;

-- Long‑running Sleep connections (>5 min)
SELECT * FROM information_schema.processlist
WHERE command = 'Sleep' AND time > 300
ORDER BY time DESC;

7. Optimization Toolbox

7.1 Essential Tools

percona‑toolkit – Swiss‑army knife for MySQL DBAs.

MySQLTuner – One‑click configuration diagnostics.

sysbench – Load‑testing utility.

mysql‑sniffer – Real‑time SQL capture.

Prometheus + Grafana – Monitoring and visualization.

7.2 Automated Optimization Script

#!/bin/bash
echo "=== MySQL Performance Auto-Optimization ==="

echo "Analyzing slow queries..."
pt-query-digest /var/log/mysql/slow.log --limit=10 > /tmp/slow_analysis.txt


echo "Checking table fragmentation..."
mysql -e "SELECT table_schema, table_name, ROUND(data_free/1024/1024,2) AS data_free_mb FROM information_schema.tables WHERE data_free > 100*1024*1024 ORDER BY data_free DESC;"


echo "Analyzing index usage..."
mysql -e "SELECT object_schema, object_name, index_name, count_star AS usage_count FROM performance_schema.table_io_waits_summary_by_index_usage WHERE object_schema NOT IN ('mysql','performance_schema') AND index_name IS NOT NULL ORDER BY count_star DESC LIMIT 20;"


echo "Generating optimization recommendations..."
mysqltuner --outputfile /tmp/mysqltuner_report.txt

echo "Optimization report generated at /tmp/"

Conclusion

Effective MySQL optimization is a systematic engineering effort:

Monitoring first : Build a complete observability stack.

Targeted fixes : Identify bottlenecks before changing anything.

Iterative changes : Adjust one parameter at a time and measure impact.

Backup always : Preserve data before any structural change.

Continuous learning : Stay updated with new MySQL features and best practices.

By applying these principles and the concrete techniques above, you can turn a sluggish database into a high‑performance engine capable of handling billions of rows.

MonitoringindexingMySQL
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.