Why MySQL Picks the Wrong Index and How to Fix It
Learn how MySQL’s optimizer decides between indexes, why the same query can suddenly use a different index with dramatically slower performance, and practical steps—including analyzing execution plans, updating statistics, checking index coverage, and rebuilding fragmented indexes—to diagnose and resolve index selection issues.
Preface
Imagine you are a waiter in a restaurant with two menus: Menu A sorts dishes by category (appetizer, main, dessert) and Menu B sorts by price. When a customer asks for the cheapest Sichuan dish, you can either filter by price first then by cuisine, or filter by cuisine first then sort by price. This analogy illustrates the daily decision‑making of the MySQL optimizer.
1 A case that drives programmers crazy
Requirement: retrieve the first 100 paid orders since the beginning of the year.
Indexes created:
idx_status on status idx_create_time on create_time SQL:
SELECT * FROM orders WHERE status = 'paid' -- status condition
AND create_time > '2025-01-01' -- time condition
ORDER BY amount DESC
LIMIT 100;Execution plan on Monday:
Using index: idx_status (status index)
Rows examined: 500
Time: 0.1 sExecution plan on Tuesday:
Using index: idx_create_time (time index)
Rows examined: 500 000
Time: 8 sThe same query scans 500 rows on Monday but 500 000 rows on Tuesday, resulting in an 80× performance difference.
2 Reveal optimizer's three‑step decision
The optimizer evaluates cost based on estimated rows, back‑track count, and sorting cost. The following diagram (original) shows the decision flow.
Cost example (simplified):
Index idx_status – estimated rows 500 000, back‑track 500 000, needs sorting → total cost 1050.
Index idx_create_time – estimated rows 50 000, back‑track 50 000, no sorting → total cost 600.
The optimizer chooses the index with the lower total cost, i.e., idx_create_time.
3 The four real culprits causing index switch
1) Data distribution change
Scenario: on Monday there are 50 000 paid orders for 2025, on Tuesday there are 500 000 paid orders for 2025. The sudden increase in data volume inflates the estimated row count and total cost for idx_status.
Check data distribution:
SELECT COUNT(*) AS total,
SUM(status='paid') AS paid_count,
SUM(create_time>'2023-01-01') AS new_orders
FROM orders;2) Stale statistics
Out‑of‑date statistics are like using an old map; the optimizer may choose a full table scan.
Delete stale statistics: ANALYZE TABLE orders DELETE STATISTICS; Refresh statistics:
ANALYZE TABLE orders;3) Index coverage difference
A covering index contains all columns needed by the query, avoiding a back‑track (row lookup). The analogy: Menu A shows price directly (covering), Menu B requires asking the chef (back‑track).
Query using idx_status (needs back‑track): SELECT * FROM orders WHERE status='paid'; Query using idx_create_time (covering):
SELECT create_time FROM orders WHERE create_time>'2023-01-01';4) Index fragmentation
Fragmented indexes are like torn table of contents; locating data becomes slower.
Check fragmentation: SHOW TABLE STATUS LIKE 'orders'; Large Data_free indicates fragmentation. Rebuild the index:
ALTER TABLE orders ENGINE=INNODB;4 Problem diagnosis four‑step method
Step 1: View the current execution plan
EXPLAIN SELECT * FROM orders WHERE status='paid' AND create_time>'2023-01-01';Step 2: Check statistics
SHOW INDEX FROM orders;Pay attention to the Cardinality column; higher values mean more accurate statistics.
Step 3: Analyze data distribution
SELECT COUNT(*) AS total,
AVG(LENGTH(status)) AS status_avg_len
FROM orders;Step 4: Trace the optimizer’s reasoning
SET optimizer_trace="enabled=on";
SELECT * FROM orders WHERE ...;
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;Review the trace to see which cost components dominate.
5 Three ultimate solutions
Solution 1: Force the optimizer to use a specific index
SELECT * FROM orders FORCE INDEX(idx_status) WHERE ...;Solution 2: Create a better composite index
ALTER TABLE orders ADD INDEX idx_status_create_time(status, create_time);Solution 3: Regular maintenance plan
Periodically update statistics.
Periodically check and reduce fragmentation.
Periodically rebuild indexes.
Summary
Six must‑check points
Ensure WHERE columns have appropriate indexes.
Check whether ORDER BY/GROUP BY can use index sorting.
Keep statistics up‑to‑date, especially for large tables.
Monitor and defragment indexes regularly.
Watch for index merge (INDEX_MERGE) usage.
Prefer covering indexes to reduce back‑track.
Three golden rules
80/20 rule : 20% of indexes serve 80% of queries.
Data‑driven : Regularly analyze query patterns and adjust indexes.
Defensive programming : Explicitly specify indexes for critical queries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
