How I Cut a 20‑Second MySQL Query to 200 ms: A Step‑by‑Step Optimization Journey
When a critical e‑commerce report took over 20 seconds and triggered a flood of user complaints, a systematic investigation using EXPLAIN, slow‑query logs, and profiling revealed missing indexes, costly joins, and temporary tables, leading to a four‑stage rewrite that reduced execution time to sub‑second performance.
Background and Problem
At 3 am an alert showed that a core business API response exceeded 20 seconds. The culprit was a seemingly simple MySQL query that aggregated order statistics for VIP merchants in Beijing over the past 30 days. The original query took 20.34 seconds, a 100× slowdown.
Schema Overview
-- Orders (5 M rows)
CREATE TABLE orders (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
order_no VARCHAR(32),
merchant_id INT,
user_id INT,
amount DECIMAL(10,2),
status TINYINT,
created_at DATETIME,
updated_at DATETIME
) ENGINE=InnoDB;
-- Order items (20 M rows)
CREATE TABLE order_items (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
order_id BIGINT,
product_id INT,
quantity INT,
price DECIMAL(10,2),
created_at DATETIME
) ENGINE=InnoDB;
-- Merchants (100 k rows)
CREATE TABLE merchants (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
category VARCHAR(50),
city VARCHAR(50),
level TINYINT
) ENGINE=InnoDB;Original Query
SELECT
m.id,
m.name,
COUNT(DISTINCT o.id) AS order_count,
COUNT(DISTINCT o.user_id) AS user_count,
SUM(o.amount) AS total_amount,
AVG(o.amount) AS avg_amount,
SUM(oi.quantity) AS total_items
FROM merchants m
LEFT JOIN orders o ON m.id = o.merchant_id
LEFT JOIN order_items oi ON o.id = oi.order_id
WHERE m.city = '北京'
AND m.level = 3
AND o.status = 1
AND o.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY m.id, m.name
ORDER BY total_amount DESC
LIMIT 100;Problem Diagnosis
1. EXPLAIN Analysis
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | m | ALL | NULL | NULL | NULL | NULL | 100000 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 5000000 | Using where; Using join buffer |
| 1 | SIMPLE | oi | ALL | NULL | NULL | NULL | NULL |20000000 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------------------------------------+All three tables performed full scans (type=ALL).
No indexes were used (key=NULL).
Temporary tables and filesort were created.
Resulted in a Cartesian‑product‑like row count of 100 k × 5 M × 20 M.
2. Slow‑Query Log
# Time: 2024-03-15T03:25:41.123456Z
# Query_time: 20.342387 Lock_time: 0.000123 Rows_sent: 100 Rows_examined: 25438921Only 100 rows were returned after scanning 25 M rows, a 254 389× inefficiency.
3. Profiling
SET profiling = 1;
-- run the query
SHOW PROFILE;Sending data: 18.5 s (90%).
Creating sort index: 1.2 s (6%).
Copying to temporary table: 0.6 s (3%).
Optimization Strategy (Four‑Step Plan)
Step 1 – Add Essential Indexes
-- Merchant table index
ALTER TABLE merchants ADD INDEX idx_city_level (city, level);
-- Orders composite index (order matters)
ALTER TABLE orders ADD INDEX idx_merchant_status_created (merchant_id, status, created_at);
-- Order items index
ALTER TABLE order_items ADD INDEX idx_order_id (order_id);Index design principles:
Follow the left‑most prefix rule.
Place highly selective columns first.
Consider both WHERE and JOIN conditions.
Step 2 – Rewrite SQL (Filter First, Join Later)
SELECT
m.id,
m.name,
t.order_count,
t.user_count,
t.total_amount,
t.avg_amount,
t.total_items
FROM merchants m
INNER JOIN (
SELECT
o.merchant_id,
COUNT(DISTINCT o.id) AS order_count,
COUNT(DISTINCT o.user_id) AS user_count,
SUM(o.amount) AS total_amount,
AVG(o.amount) AS avg_amount,
SUM(items.item_count) AS total_items
FROM orders o
LEFT JOIN (
SELECT order_id, SUM(quantity) AS item_count
FROM order_items
GROUP BY order_id
) items ON o.id = items.order_id
WHERE o.status = 1
AND o.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
GROUP BY o.merchant_id
) t ON m.id = t.merchant_id
WHERE m.city = '北京'
AND m.level = 3
ORDER BY t.total_amount DESC
LIMIT 100;This reduces the intermediate result set dramatically by applying the date and status filters before the joins.
Step 3 – Use Covering Index
-- Covering index that includes all selected columns
ALTER TABLE orders ADD INDEX idx_covering (merchant_id, status, created_at, id, user_id, amount);The query can now be satisfied entirely from the index without a table lookup.
Step 4 – Materialized View (Pre‑Aggregated Summary Table)
-- Summary table
CREATE TABLE merchant_order_summary (
merchant_id INT,
summary_date DATE,
order_count INT,
user_count INT,
total_amount DECIMAL(10,2),
avg_amount DECIMAL(10,2),
total_items INT,
PRIMARY KEY (merchant_id, summary_date),
INDEX idx_date (summary_date)
) ENGINE=InnoDB;
-- Hourly refresh (example)
INSERT INTO merchant_order_summary
SELECT
merchant_id,
DATE(created_at) AS summary_date,
COUNT(DISTINCT id) AS order_count,
COUNT(DISTINCT user_id) AS user_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount,
(SELECT SUM(quantity) FROM order_items WHERE order_id IN (SELECT id FROM orders WHERE merchant_id = o.merchant_id)) AS total_items
FROM orders o
WHERE status = 1 AND created_at >= CURDATE()
GROUP BY merchant_id, DATE(created_at)
ON DUPLICATE KEY UPDATE
order_count = VALUES(order_count),
user_count = VALUES(user_count),
total_amount = VALUES(total_amount),
avg_amount = VALUES(avg_amount),
total_items = VALUES(total_items);
-- Query the summary
SELECT
m.id,
m.name,
SUM(s.order_count) AS order_count,
SUM(s.user_count) AS user_count,
SUM(s.total_amount) AS total_amount,
AVG(s.avg_amount) AS avg_amount,
SUM(s.total_items) AS total_items
FROM merchants m
INNER JOIN merchant_order_summary s ON m.id = s.merchant_id
WHERE m.city = '北京' AND m.level = 3 AND s.summary_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
GROUP BY m.id, m.name
ORDER BY total_amount DESC
LIMIT 100;Performance Gains
Original query: 20.34 s (no index).
After adding indexes: 8.5 s (≈2.4× faster).
After SQL rewrite: 2.3 s (≈8.8× faster).
After covering index: 0.8 s (≈25× faster).
After materialized view: 0.2 s (≈100× faster).
General Optimization Methodology
Diagnostic Three‑Step
EXPLAIN analysis – check type, key, and Extra fields.
Slow‑query log – enable and examine query_time, rows_examined, etc.
Profiling – use SET profiling=1 and SHOW PROFILE to locate bottlenecks.
Six‑Step Optimization Framework
Index optimization – create selective, left‑most, covering indexes.
SQL rewrite – filter before join, avoid SELECT *, use sub‑queries wisely.
Table structure – consider denormalization, proper data types, partitioning.
Cache strategy – use Redis or application‑level cache for hot data.
Read/write splitting – master‑slave replication and load balancing.
Data archiving – move historical data to cold storage.
Common Pitfalls
Over‑indexing – creates write overhead; prefer composite indexes.
Ignoring write performance – balance read‑optimizing indexes with write cost.
Cache stampede – randomize TTL to avoid massive simultaneous expirations.
Monitoring and Prevention
Slow‑Query Monitoring View
CREATE VIEW slow_query_monitor AS
SELECT
DATE(start_time) AS query_date,
LEFT(sql_text,100) AS query_sample,
COUNT(*) AS exec_count,
AVG(query_time) AS avg_time,
MAX(query_time) AS max_time,
SUM(rows_examined) AS total_rows_examined
FROM mysql.slow_log
GROUP BY DATE(start_time), LEFT(sql_text,100)
ORDER BY avg_time DESC;Automated Alert Script (Python)
#!/usr/bin/env python3
import MySQLdb, smtplib
from email.mime.text import MIMEText
def check_slow_queries():
db = MySQLdb.connect(host="localhost", user="monitor", passwd="password", db="mysql")
cur = db.cursor()
cur.execute("""
SELECT COUNT(*) AS slow_count, AVG(query_time) AS avg_time
FROM mysql.slow_log
WHERE start_time >= DATE_SUB(NOW(), INTERVAL 1 HOUR)
""")
slow_count, avg_time = cur.fetchone()
if slow_count > 100 or avg_time > 5:
send_alert(f"Slow query alert: count={slow_count}, avg_time={avg_time}s")
cur.close(); db.close()
def send_alert(message):
msg = MIMEText(message)
msg['Subject'] = 'MySQL Slow Query Alert'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
s = smtplib.SMTP('localhost')
s.send_message(msg)
s.quit()
if __name__ == "__main__":
check_slow_queries()Quick Diagnostic Bash Script
#!/bin/bash
echo "=== MySQL Performance Quick Check ==="
echo -e "
[1] Slow Query Settings:"
mysql -e "SHOW VARIABLES LIKE '%slow%';"
echo -e "
[2] Recent Slow Queries:"
mysql -e "SELECT * FROM mysql.slow_log ORDER BY start_time DESC LIMIT 5\G"
echo -e "
[3] Index Usage Stats:"
mysql -e "SELECT table_schema, table_name, index_name, cardinality FROM information_schema.STATISTICS WHERE table_schema NOT IN ('mysql','information_schema','performance_schema') ORDER BY cardinality DESC LIMIT 10;"
echo -e "
[4] Table Sizes:"
mysql -e "SELECT table_schema, table_name, ROUND(data_length/1024/1024,2) AS Data_MB, ROUND(index_length/1024/1024,2) AS Index_MB, ROUND((data_length+index_length)/1024/1024,2) AS Total_MB FROM information_schema.TABLES WHERE table_schema NOT IN ('mysql','information_schema','performance_schema') ORDER BY data_length+index_length DESC LIMIT 10;"
echo -e "
[5] Current Connections:"
mysql -e "SHOW STATUS LIKE '%connect%';"
echo "=== Check Complete ==="Toolbox
MySQL built‑in tools: EXPLAIN, SHOW PROFILE, Performance Schema.
Third‑party: Percona Toolkit pt‑query‑digest, MySQLTuner, MySQL Workbench, Prometheus + Grafana.
Online analysis: mysqladmin processlist, INFORMATION_SCHEMA.INNODB_LOCKS, INFORMATION_SCHEMA.INNODB_TRX.
Best‑Practice Takeaways
Measure before you change – use logs and profiling to establish baselines.
Apply the 80/20 rule – focus on the few queries that consume most resources.
Iterative optimization – change one thing at a time, record impact, keep rollback plans.
Team Collaboration Advice
SQL Review Process – checklist for indexes, full scans, join count, SELECT *, sub‑query vs JOIN, growth considerations.
Development Standards – naming conventions, index naming, SQL style guide.
Knowledge Sharing – regular tech talks, maintain an internal case‑library, document patterns in a wiki.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
