Databases 35 min read

Master MySQL Slow Query Optimization: Proven Methods & Pitfall Guide

This comprehensive guide walks you through a systematic methodology for diagnosing and fixing MySQL slow queries, covering proper log configuration, EXPLAIN analysis, index design, query rewriting, table restructuring, and practical caching techniques, while highlighting common misconceptions and real‑world case studies.

Ops Community
Ops Community
Ops Community
Master MySQL Slow Query Optimization: Proven Methods & Pitfall Guide

Introduction

"Again this query! Optimized three times, still slow?" The author, a DBA, shares a three‑year journey of systematic slow‑query optimization that reduced P99 query time from 2 seconds to 200 ms and cut slow‑query count by 90%.

Technical Background: What Is a Slow Query?

MySQL’s default slow‑query threshold is 10 seconds, which is unsuitable for production. Reasonable thresholds depend on query type:

Reasonable slow‑query definition:

Primary‑key lookup: acceptable <10 ms, slow >50 ms.

Regular range query: acceptable <100 ms, slow >500 ms.

Aggregate query: acceptable <1 s, slow >5 s.

Report query: acceptable <10 s, slow >60 s.

Batch operation: acceptable <30 s, slow >300 s.

Root Causes of Slow Queries

Doing too much useless work – scanning massive unnecessary data.

Long wait times – lock wait, I/O wait, network wait.

All slow‑query optimization ultimately solves these two problems.

Common Misconceptions

Misconception 1: Every slow query needs an index. Truth: Blindly adding indexes can worsen performance and increase storage and write overhead.

Misconception 2: More indexes are better. Truth: A table should have no more than five indexes (except special cases); excess indexes slow writes and increase optimizer cost.

Misconception 3: EXPLAIN type=ALL always means slow. Truth: For tiny tables, a full table scan can be faster than using an index.

Misconception 4: Optimization only looks at SQL, not business logic. Truth: Changing business requirements (e.g., adding pagination) often yields the biggest gains.

Case Study: System Background

System architecture

Database: MySQL 8.0.32, master‑slave.

Server: 16 CPU 64 GB RAM, SSD.

Data volume: 20 million rows in core tables, total 500 GB.

Load: 10 million queries per day, peak QPS 5000.

Initial problems

Slow‑query count: >8000 per day.

P99 response time: 2.3 s.

Slow‑query proportion: 0.08 %.

Key pain points: order queries, user statistics, report generation.

Core Content: Systematic Slow‑Query Optimization Methodology

Step 1: Identify and Analyze Slow Queries

Configure Slow Query Log

-- View current slow‑query settings
SHOW VARIABLES LIKE 'slow%';
SHOW VARIABLES LIKE 'long_query_time';

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';

-- Set slow‑query threshold (recommended 1 s)
SET GLOBAL long_query_time = 1;

-- Log queries not using indexes
SET GLOBAL log_queries_not_using_indexes = 'ON';

-- Throttle logging of such queries
SET GLOBAL log_throttle_queries_not_using_indexes = 10;

-- Persist in /etc/my.cnf
[mysqld]
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 1
log_queries_not_using_indexes = 1
log_throttle_queries_not_using_indexes = 10

Analyze Slow Query Log (recommended: pt‑query‑digest)

# Install Percona Toolkit
yum install percona-toolkit

# Digest top 20 slow queries
pt-query-digest --limit 20 /var/log/mysql/slow.log > slow_query_report.txt

# View report
less slow_query_report.txt

Report example:

# Query 1: 234.5 QPS, 12.3s avg, ID 0x123ABC...
# Calls: 20842  Total time: 256341s  Mean: 12.3s
# Rows sent: 4168400  Rows examined: 208420000
SELECT o.order_id, o.user_id, o.total_amount, u.username
FROM orders o
LEFT JOIN users u ON o.user_id = u.user_id
WHERE o.status = 'pending'
AND o.create_time > '2024-01-01'
ORDER BY o.create_time DESC
LIMIT 20\G

Key interpretation:

Calls : execution count – prioritize high‑frequency queries.

Total time : overall impact on performance.

Rows examined : high ratio of examined to sent rows signals need for optimization.

Build Slow‑Query Priority

High priority (immediate)

Core business impact (order, payment, login).

High frequency (≥10 per minute).

Single execution >5 s.

Causes lock or deadlock.

Medium priority (plan)

Medium frequency (10–100 per hour).

Non‑core business impact.

Can be mitigated by cache or async processing.

Low priority (defer)

Very low frequency (<10 per day).

Background statistics or reports.

Can run in off‑peak windows.

Step 2: Deep EXPLAIN Analysis

EXPLAIN is the core tool, but looking only at the type column is insufficient.

EXPLAIN column details system: single‑row system table – fastest. const: primary‑key or unique index equality – very fast. eq_ref: unique index join – usually in JOIN. ref: non‑unique index equality. range: index range scan (>, <, BETWEEN, IN). index: full index scan. ALL: full table scan – slowest.

Optimization suggestions const, eq_ref, ref: usually fine. range: acceptable, watch scanned rows. index, ALL: need improvement unless table is tiny.

Key columns

type : access type, from good to bad.

key : actual index used.

rows : estimated rows scanned – crucial metric.

Extra : additional info (e.g., Using filesort, Using temporary are performance killers).

EXPLAIN ANALYZE (MySQL 8.0+) provides real execution times and loop counts, allowing comparison between estimated cost and actual time.

Step 3: Proper Index Optimization

Golden Rules for Index Design

Rule 1: Leftmost Prefix

Composite index (a, b, c) speeds queries that filter on a, a + b, or a + b + c.

Cannot speed queries that skip earlier columns (e.g., WHERE b = ?).

Rule 2: Column Order Matters

High‑cardinality columns first.

Equality columns before range columns.

Frequently queried columns first.

Rule 3: Covering Indexes Are Best

-- Query only needs id and name
SELECT user_id, username FROM users WHERE status = 'active';

-- Create covering index
CREATE INDEX idx_status_id_name ON users(status, user_id, username);

-- EXPLAIN shows Extra: Using index

Practical Example: Order Query Optimization

Original slow query

-- Query pending orders for a user
SELECT order_id, total_amount, create_time
FROM orders
WHERE user_id = 12345
AND status = 'pending'
ORDER BY create_time DESC
LIMIT 20;

EXPLAIN analysis

type: ref
key: idx_user_id
rows: 50000  -- scans 50 k rows, takes 850 ms
Extra: Using where; Using filesort

Optimization 1: Composite Index

CREATE INDEX idx_user_status_time ON orders(user_id, status, create_time);
DROP INDEX idx_user_id ON orders;
DROP INDEX idx_status ON orders;

After re‑EXPLAIN:

type: range
key: idx_user_status_time
rows: 234   -- scans only 234 rows
Extra: Using index condition   -- no filesort

Performance comparison:

Before: 50000 rows scanned, 850 ms.

After: 234 rows scanned, 12 ms (≈70× faster).

Optimization 2: Covering Index

CREATE INDEX idx_user_status_time_amount ON orders(user_id, status, create_time, total_amount);

EXPLAIN shows Extra: Using index – no table lookup needed.

When Not to Add Indexes

Scenario 1: Frequently Updated Columns

CREATE INDEX idx_update_time ON orders(update_time);
-- Every UPDATE must also update this index, hurting write performance.

Scenario 2: Low‑Cardinality Columns

CREATE INDEX idx_gender ON users(gender);
-- Gender has ~50 % selectivity; full scan may be faster.

Assess cardinality:

SELECT COUNT(DISTINCT status)/COUNT(*) AS selectivity FROM orders;
-- Recommend: cardinality <10 % → avoid index; >80 % → add index.

Step 4: Query Rewrite and Optimization Techniques

Avoid SELECT *

-- Bad
SELECT * FROM orders WHERE order_id = 12345;

-- Good
SELECT order_id, total_amount, status FROM orders WHERE order_id = 12345;

Reasons: reduces I/O, enables covering indexes, lowers network payload.

Avoid Functions on Indexed Columns

-- Bad
SELECT * FROM orders WHERE DATE(create_time) = '2024-01-01';

-- Good
SELECT * FROM orders WHERE create_time >= '2024-01-01 00:00:00'
  AND create_time < '2024-01-02 00:00:00';

Avoid Implicit Type Conversion

-- Bad (string forces conversion)
SELECT * FROM orders WHERE user_id = '12345';

-- Good
SELECT * FROM orders WHERE user_id = 12345;

Optimize OR Conditions

-- Slow
SELECT * FROM orders WHERE user_id = 12345 OR order_id = 67890;

-- Faster (UNION)
SELECT * FROM orders WHERE user_id = 12345
UNION
SELECT * FROM orders WHERE order_id = 67890;

-- Or use IN when same column
SELECT * FROM orders WHERE user_id IN (12345, 67890);

Optimize Pagination

-- Deep pagination (slow)
SELECT * FROM orders ORDER BY create_time DESC LIMIT 1000000, 20;

-- Cursor‑based pagination
SELECT * FROM orders ORDER BY create_time DESC LIMIT 20;
-- Remember last create_time, then:
SELECT * FROM orders WHERE create_time < '2024-01-15 10:30:00'
ORDER BY create_time DESC LIMIT 20;

Optimize COUNT Queries

-- Slow full count
SELECT COUNT(*) FROM orders WHERE status = 'pending';

-- Approximate using InnoDB stats
SELECT VARIABLE_VALUE FROM information_schema.INNODB_TABLESTATS WHERE TABLE_NAME='orders';

-- Maintain a separate counter table or use index on status.

Step 5: Table Structure Optimization

Issue 1: Wide tables (many columns)

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(50),
  email VARCHAR(100),
  -- ... 98 more columns ...
  last_login_time DATETIME
);

Solution: vertical partitioning – split core fields and profile fields into separate tables.

CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(50),
  email VARCHAR(100),
  status TINYINT,
  create_time DATETIME
);

CREATE TABLE user_profile (
  user_id INT PRIMARY KEY,
  real_name VARCHAR(100),
  phone VARCHAR(20),
  address TEXT,
  FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Issue 2: Very large tables (billions of rows)

Solution 1: Partitioned tables (e.g., yearly partitions).

CREATE TABLE orders (
  order_id BIGINT PRIMARY KEY,
  user_id INT,
  create_time DATETIME,
  -- other columns ...
) PARTITION BY RANGE (YEAR(create_time)) (
  PARTITION p2022 VALUES LESS THAN (2023),
  PARTITION p2023 VALUES LESS THAN (2024),
  PARTITION p2024 VALUES LESS THAN (2025)
);

Solution 2: Archive historical data to a separate table and query with UNION when needed.

CREATE TABLE orders_history LIKE orders;
INSERT INTO orders_history SELECT * FROM orders WHERE create_time < DATE_SUB(NOW(), INTERVAL 1 YEAR);
DELETE FROM orders WHERE create_time < DATE_SUB(NOW(), INTERVAL 1 YEAR);

Issue 3: Improper column types

-- Bad definitions
user_id VARCHAR(50),   -- actually numeric
status VARCHAR(20),   -- few enum values
amount DOUBLE,         -- monetary value

-- Better definitions
user_id BIGINT,
status TINYINT,
amount DECIMAL(10,2);

Recommendations summary:

Primary‑key IDs → BIGINT.

Status/enum → TINYINT.

Amount → DECIMAL(10,2).

Date/time → DATETIME/TIMESTAMP.

Phone numbers → VARCHAR(20).

IP addresses → INT UNSIGNED.

Step 6: Other Optimization Techniques

Use Cache to Reduce DB Load

function getUserInfo(userId) {
  // Check Redis first
  let info = redis.get('user:' + userId);
  if (info !== null) return info;

  // Miss – query MySQL
  info = mysql.query('SELECT * FROM users WHERE user_id = ?', userId);

  // Store in cache for 1 hour
  redis.setex('user:' + userId, 3600, info);
  return info;
}

Read/Write Splitting

-- Writes go to master
INSERT INTO orders (...) VALUES (...);
UPDATE orders SET status='paid' WHERE order_id=12345;

-- Reads go to replica (allowing slight lag)
SELECT * FROM orders WHERE user_id=12345;

-- Critical reads still go to master
SELECT * FROM orders WHERE order_id=12345 FOR UPDATE;

Use Connection Pool

// Without pool – high overhead
Connection conn = DriverManager.getConnection(url, user, password);
... // use
conn.close();

// With HikariCP pool
HikariConfig config = new HikariConfig();
config.setJdbcUrl(url);
config.setUsername(user);
config.setPassword(password);
config.setMaximumPoolSize(20);
config.setMinimumIdle(5);
config.setConnectionTimeout(30000);
HikariDataSource ds = new HikariDataSource(config);
Connection conn = ds.getConnection();

Practical Case: E‑commerce Order Query from 3 s to 50 ms

Problem Background

Average response: 1.2 s.

P99 response: 3.5 s.

Peak QPS: 500.

Slow queries: >80 per minute.

Original SQL

SELECT o.order_id, o.order_no, o.total_amount, o.status, o.create_time,
       GROUP_CONCAT(p.product_name) AS products,
       GROUP_CONCAT(oi.quantity) AS quantities
FROM orders o
LEFT JOIN order_items oi ON o.order_id = oi.order_id
LEFT JOIN products p ON oi.product_id = p.product_id
WHERE o.user_id = ?
  AND o.status IN ('pending','paid','shipped')
  AND o.create_time >= DATE_SUB(NOW(), INTERVAL 6 MONTH)
GROUP BY o.order_id
ORDER BY o.create_time DESC
LIMIT 20;

First Analysis: EXPLAIN

id: 1  type: ref  key: idx_user_id  rows: 12000  Extra: Using where; Using temporary; Using filesort

Findings:

Scans 12 k orders, returns only 20.

Uses temporary table and filesort (GROUP_CONCAT & ORDER BY).

Three‑table join adds complexity.

Second Optimization: Index Optimization

CREATE INDEX idx_user_status_time ON orders(user_id, status, create_time);
EXPLAIN shows rows reduced to 580 but still temporary/filesort.

Effect: average latency ↓ from 1.2 s to 600 ms.

Third Optimization: Query Rewrite

Problem: GROUP_CONCAT forces temporary table.

Solution: split into two queries – first fetch order list, then batch fetch items.

-- Query 1: order list (fast)
SELECT order_id, order_no, total_amount, status, create_time
FROM orders
WHERE user_id = ?
  AND status IN ('pending','paid','shipped')
  AND create_time >= DATE_SUB(NOW(), INTERVAL 6 MONTH)
ORDER BY create_time DESC
LIMIT 20;

-- Query 2: items for those orders
SELECT oi.order_id, p.product_name, oi.quantity
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
WHERE oi.order_id IN (list of 20 ids)
ORDER BY oi.order_id, oi.item_id;

EXPLAIN shows type: range, rows ≈ 20 for orders and ≈ 60 for items.

Effect: average latency ↓ to 120 ms, P99 ↓ to 350 ms.

Fourth Optimization: Cache Hot Data

public List<Order> getUserOrders(Long userId, int page, int size) {
  String cacheKey = "user:orders:" + userId + ":" + page;
  List<Order> orders = redis.get(cacheKey);
  if (orders != null) return orders;
  orders = orderDao.queryUserOrders(userId, page, size);
  redis.setex(cacheKey, 300, orders); // 5 min TTL
  return orders;
}

public void updateOrderStatus(Long orderId, String newStatus) {
  orderDao.updateStatus(orderId, newStatus);
  Long userId = orderDao.getUserIdByOrderId(orderId);
  redis.deletePattern("user:orders:" + userId + ":*");
}

Effect: cache hit rate 85 %, hit latency 8 ms, overall average 30 ms, P99 150 ms, DB QPS reduced from 500 to 75.

Final Results

Average response time: 1200 ms → 30 ms (97.5 % improvement).

P99 response time: 3500 ms → 150 ms (95.7 % improvement).

Slow queries: 80/min → 2/min (97.5 % reduction).

Database QPS: 500 → 75 (85 % reduction).

Key Takeaways

Master the tools (slow‑query log, EXPLAIN, pt‑query‑digest) before changing anything.

Indexes are essential but not a cure‑all; sometimes query rewrite or caching yields bigger gains.

Data‑driven decisions (EXPLAIN, monitoring) beat intuition.

Understanding business requirements often leads to the best optimizations.

Performance tuning is continuous – monitor, iterate, and document.

Future Trends

Self‑adaptive indexes that create/drop automatically.

AI‑powered optimizers using machine learning.

Columnar storage engines for analytical workloads.

Distributed SQL databases (TiDB, CockroachDB) offering horizontal scalability.

Regardless of emerging technologies, a solid grasp of SQL execution principles and a systematic optimization workflow remain the foundation for high‑performance database systems.

databasePerformance TuningMySQLIndex OptimizationSlow Query
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.