How to Diagnose and Optimize a Minute‑Long MySQL Join Query
This article walks through a real‑world MySQL slow‑query case, explains the inner workings of Join and Order by algorithms, shows how to interpret EXPLAIN and Optimizer_trace output, and demonstrates how adding an index can transform a costly Block Nested‑Loop Join into a fast Index Nested‑Loop Join, dramatically improving performance.
Online Slow SQL Background
A minute‑level slow query was detected by a monitoring platform, severely impacting user experience. The anonymized SQL and simplified table schemas are shown below.
select t1.*, t2.x from t_table1 t1 left join t_table2 t2 on t1.a = t2.a order by t1.c desc; CREATE TABLE `t_table1` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`a` varchar(64) NOT NULL,
`b` varchar(64) NOT NULL,
`c` varchar(20) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_a` (`a`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4;
CREATE TABLE `t_table2` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主键',
`a` varchar(64) NOT NULL,
`x` varchar(64) NOT NULL,
`y` varchar(20) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8mb4;Table statistics: MySQL 5.6.x, t_table1 has 3 000 rows, t_table2 has 70 000 rows.
Principle Exploration
Join Algorithm Principles
In a JOIN the engine first scans the driver table and then probes the inner table . Driver selection follows semantic rules (e.g., STRAIGHT_JOIN forces the left‑hand table, outer joins prefer the table on the join side) or, absent constraints, the engine chooses the table that minimizes total I/O.
Block Nested‑Loop Join (BNL) : The driver table is read in chunks of K rows, each chunk is cached in a join buffer, and the inner table is scanned fully for each chunk. Cost ≈ m + λ·m·n (λ ∈ (0,1)). The smaller the driver table (m), the lower the cost, which explains the “small‑table drives large‑table” rule.
Index Nested‑Loop Join (NLJ) : When the inner table has an index on the join key, the engine uses the index to locate matching rows, avoiding full scans. Each probe costs ≈ 2·log₂ n, giving total cost ≈ m + 2·m·log₂ n, again favoring a smaller driver table.
Order by Algorithm Principles
MySQL allocates a sort buffer per thread. If the buffer is insufficient, a temporary disk file is created, dramatically slowing sorting.
Full‑field sort : All selected columns are stored in the sort buffer and sorted directly.
Rowid sort : Only the sort key and primary key are stored; after sorting, the engine performs a primary‑key lookup (back‑table) to retrieve full rows. This reduces memory usage but adds extra I/O for the back‑table lookup.
Optimization Process
Execution Plan Analysis
EXPLAIN shows both tables scanned fully and extra hints:
Using temporary; Using filesort; Using join buffer (Block Nested Loop). The estimated plan suggests three main costs: scanning t_table1 (3 000 rows, cost 615), scanning t_table2 repeatedly via BNL (≈ 4.19 × 10⁷), and sorting the result (≈ 2.1 × 10⁸).
Optimizer_trace JSON confirms the estimated costs and shows that the join result is stored in an in‑memory temporary table, and the sort uses Rowid mode <sort_key, rowid> without creating disk files.
{
"considered_execution_plans": [
{
"table": "`t_table1` `t1`",
"best_access_path": {
"considered_access_paths": [
{"rows_to_scan":3000,"access_type":"scan","resulting_rows":3000,"cost":615,"chosen":true,"use_tmp_table":true}
]
},
"rest_of_plan": [
{
"table": "`t_table2` `t2`",
"best_access_path": {
"considered_access_paths": [
{"rows_to_scan":69882,"access_type":"scan","using_join_cache":true,"buffers_needed":5,"resulting_rows":69882,"cost":4.19e7,"chosen":true}
]
},
"rows_for_plan":2.1e8,
"cost_for_plan":4.19e7,
"sort_cost":2.1e8,
"new_cost_for_plan":2.52e8,
"chosen":true
}
]
}
]
}Actual execution parameters (also from Optimizer_trace) show 3 000 rows sorted in memory, using Rowid sort, and no temporary files.
{
"join_execution": {
"select#":1,
"steps": [
{"creating_tmp_table": {"tmp_table_info": {"table":"intermediate_tmp_table","row_length":655,"key_length":0,"unique_constraint":false,"location":"memory (heap)","row_limit_estimate":25614}}},
{"filesort_summary": {"rows":3000,"examined_rows":3000,"number_of_tmp_files":0,"sort_buffer_size":60200,"sort_mode":"<sort_key, rowid>"}}
]
}
}Interpretation: the optimizer over‑estimated the sorting cost, but the real bottleneck was the BNL algorithm repeatedly scanning the large inner table.
Final Optimization
Adding an appropriate index on the join column of t_table2 (e.g., CREATE INDEX idx_a ON t_table2(a);) forces the engine to switch from BNL to NLJ, eliminating the costly repeated full scans. After the index was created, monitoring showed the query response time dropped below 20 ms.
Summary
The case demonstrates a systematic approach to SQL performance tuning: understand the semantics of Join and Order by, use EXPLAIN and Optimizer_trace to verify hypotheses, and apply index‑based optimizations to guide the execution engine toward more efficient algorithms.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
