How to Turn Slow MySQL Queries into Millisecond Responses: Real‑World Optimization Case
This article walks through a real e‑commerce MySQL performance crisis, showing how to pinpoint bottlenecks, analyze slow‑query logs, use EXPLAIN, add composite indexes, rewrite SQL, apply partitioning, read/write splitting and caching, and achieve sub‑second response times with 99% CPU reduction.
Case Background: E‑commerce performance crisis
During a Double‑11 sale the order‑query API responded in 15‑30 seconds, CPU usage stayed above 90%, slow‑query log showed >300 entries per minute and user complaints surged 500%.
Step 1 – Identify bottlenecks
1.1 System monitoring
# Show process list
mysql> SHOW PROCESSLIST;
# Show slow‑query variables
mysql> SHOW VARIABLES LIKE 'slow_query%';
mysql> SHOW VARIABLES LIKE 'long_query_time';
# InnoDB status
mysql> SHOW ENGINE INNODB STATUS\GKey findings :
Active connections 512/800 (near limit)
Average query time 12.5 s
Frequent lock waits
1.2 Slow‑query log analysis
# Top 10 slowest queries
mysqldumpslow -s t -t 10 /var/log/mysql/mysql-slow.log
# Most frequent queries
mysqldumpslow -s c -t 10 /var/log/mysql/mysql-slow.logThe main offending query (redacted) performed a full‑table scan on orders and lacked suitable indexes, causing a filesort and scanning nearly 3 million rows.
Step 2 – EXPLAIN deep dive
EXPLAIN SELECT ... FROM orders o LEFT JOIN users u ON o.user_id=u.id ... WHERE o.create_time>='2023-11-01' AND o.status IN (1,2,3,4,5) ORDER BY o.create_time DESC LIMIT 20;Problems discovered : orders scanned with type=ALL
No covering index for the WHERE clause
Filesort used
~2.8 million rows examined
2.2 Index status
Existing indexes:
PRIMARY KEY (id)
KEY idx_user_id (user_id)
Missing indexes on create_time, status, and a composite index covering the query.
Step 3 – SQL optimisation
3.1 Composite index
# Create composite index (order matters)
ALTER TABLE orders ADD INDEX idx_status_createtime_id (status, create_time, id);
-- Reasoning:
-- 1. status used in WHERE (low cardinality)
-- 2. create_time is a range filter
-- 3. id allows ORDER BY to use index order, avoiding filesort3.2 Rewritten SQL (pagination optimisation)
-- Version 1: pagination with index‑aware subquery
SELECT o.*, u.username, p.product_name, p.price
FROM (
SELECT id FROM orders
WHERE create_time>='2023-11-01' AND status IN (1,2,3,4,5)
ORDER BY create_time DESC, id DESC
LIMIT 20
) o
LEFT JOIN users u ON o.user_id=u.id
LEFT JOIN order_items oi ON o.id=oi.order_id
LEFT JOIN products p ON oi.product_id=p.id
ORDER BY o.create_time DESC, o.id DESC;3.3 Delayed join version
SELECT o.id, o.user_id, o.total_amount, o.status, o.create_time,
u.username, p.product_name, p.price
FROM (
SELECT id, user_id, total_amount, status, create_time
FROM orders
WHERE create_time>='2023-11-01' AND status IN (1,2,3,4,5)
ORDER BY create_time DESC, id DESC
LIMIT 20
) o
LEFT JOIN users u ON o.user_id=u.id
LEFT JOIN order_items oi ON o.id=oi.order_id
LEFT JOIN products p ON oi.product_id=p.id;Performance tests showed the original query took 18.5 s, scanning 2.8 M rows with 85 % CPU. Adding the composite index reduced execution to 2.1 s (24 k rows, 45 % CPU). The delayed‑join version achieved 0.08 s, 20 rows scanned and only 15 % CPU – a 230‑fold speedup.
Step 4 – Advanced strategies
4.1 Partitioned table
CREATE TABLE orders_partitioned (
id BIGINT PRIMARY KEY,
user_id INT NOT NULL,
total_amount DECIMAL(10,2),
status TINYINT,
create_time DATETIME
) PARTITION BY RANGE (YEAR(create_time)*100 + MONTH(create_time)) (
PARTITION p202310 VALUES LESS THAN (202311),
PARTITION p202311 VALUES LESS THAN (202312),
PARTITION p202312 VALUES LESS THAN (202401),
PARTITION p_future VALUES LESS THAN MAXVALUE
);4.2 Read‑write splitting (Python example)
class DatabaseRouter:
def __init__(self):
self.master = get_master_connection()
self.slaves = get_slave_connections()
def execute_query(self, sql, is_write=False):
if is_write or self.is_write_operation(sql):
return self.master.execute(sql)
else:
slave = random.choice(self.slaves)
return slave.execute(sql)
def is_write_operation(self, sql):
write_keywords = ['INSERT', 'UPDATE', 'DELETE', 'ALTER']
return any(keyword in sql.upper() for keyword in write_keywords)4.3 Redis cache for order lists
class OrderCacheManager:
def __init__(self):
self.redis_client = redis.Redis(host='localhost', port=6379, db=0)
self.cache_ttl = 300 # 5 minutes
def get_orders(self, user_id, page=1, size=20):
cache_key = f"orders:{user_id}:{page}:{size}"
cached = self.redis_client.get(cache_key)
if cached:
return json.loads(cached)
orders = self.query_from_database(user_id, page, size)
self.redis_client.setex(cache_key, self.cache_ttl, json.dumps(orders, default=str))
return ordersStep 5 – Monitoring & alerting
# Prometheus + Grafana metrics (mysql_exporter)
mysql_global_status_slow_queries
mysql_global_status_threads_connected / mysql_global_variables_max_connections
rate(mysql_global_status_queries[5m])
mysql_info_schema_innodb_metrics_lock_timeoutsA Bash script can automatically alert when slow‑query count exceeds a threshold:
#!/bin/bash
slow_queries=$(mysql -e "SHOW GLOBAL STATUS LIKE 'Slow_queries';" | awk 'NR==2{print $2}')
if [ $slow_queries -gt 100 ]; then
echo "大量慢查询,开始分析..."
mysqldumpslow -s t -t 5 /var/log/mysql/mysql-slow.log > /tmp/slow_analysis.log
mail -s "数据库慢查询告警" [email protected] < /tmp/slow_analysis.log
fiResult summary
After applying the index, partitioning, read/write splitting and caching, average response time dropped from 18.5 s to 0.08 s (99.6 % reduction), CPU usage fell from >90 % to ~15 %, and slow‑query rate fell below 5 per minute. User satisfaction rose from 60 % to 95 %.
All repository links mentioned are technical references: GitHub: https://github.com/raymond999999 Gitee: https://gitee.com/raymond9
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
