Why MySQL Joins Aren’t Always Slow: Inside the Four Join Algorithms
This article uses a realistic interview scenario to explore why multi-table joins in MySQL are not inherently slow, detailing four join algorithms—simple nested-loop, index nested-loop, block nested-loop, and batched key access—along with their pseudo‑code, execution steps, performance trade‑offs, and the impact of features like MRR and recent optimizer changes.
In a typical interview scenario, a candidate claims that multi‑table joins are used in every project despite concerns about performance. The article explains why joins are not inherently inefficient and presents four MySQL join algorithms that the optimizer may choose.
1. Simple Nested‑Loops Join
This is the most straightforward implementation: for each row from the driving table (A), the engine scans the entire inner table (B) to find matching rows.
for each row in table A matching range {
for each row in table B matching join column {
if row satisfies join conditions, send to client
}
}Example SQL:
SELECT * FROM product p INNER JOIN `order` o ON p.id = o.product_id;Execution steps:
Outer loop reads a row from product.
Inner loop scans all rows of order to find matches.
Matching rows are added to the result set.
Steps 1‑3 repeat until all rows of product are processed.
This approach can be extremely slow when the driving table yields many rows because it triggers a full scan of the inner table for each outer row.
2. Index Nested‑Loops Join
If the join column of the inner table (B) has an index, the optimizer can use it to locate matching rows directly, avoiding a full table scan.
for each row in table A matching range {
use index on B's join column to find matching rows
if row satisfies join conditions, send to client
}Execution steps are similar to the simple nested‑loops join, but step 2 uses the index lookup, which dramatically reduces I/O.
3. Block Nested‑Loops Join
This variant buffers rows from the driving table in a join buffer. When the buffer fills, the engine processes the buffered rows against the inner table in bulk, reducing the number of full scans.
for each row in table A matching range {
store join column in join buffer
if (join buffer is full) {
for each row in table B matching join column {
if row satisfies join conditions, send to client
}
}
}From MySQL 8.0.20 onward, block nested‑loops joins have been replaced by hash joins, so this algorithm is now deprecated.
4. Batched Key Access (BKA) Join
BKA combines the benefits of Multi‑Range Read (MRR) and the join buffer. It batches keys from the driving table, sorts them, and then performs a range read on the inner table, turning random I/O into sequential I/O.
for each row in table A matching range {
store join column in join buffer
if (join buffer is full) {
for each row in table B matching join column {
send to MRR interface and order by primary key
if row satisfies join conditions, send to client
}
}
}When the optimizer selects BKA, the EXTRA column of the execution plan shows Using where; Using join buffer (Batched Key Access).
Additional Optimizations
MySQL 5.6 introduced Multi‑Range Read (MRR), which sorts primary‑key IDs obtained from a secondary‑index range scan, converting random disk I/O into sequential I/O and reducing the number of page reads.
From MySQL 8.0.20, the optimizer prefers hash joins over block nested‑loops joins for suitable cases.
Conclusion
The article covered four MySQL join algorithms, their implementation details, and performance considerations. Future topics will compare these with hash joins and with application‑level data merging approaches.
Senior Tony
Former senior tech manager at Meituan, ex‑tech director at New Oriental, with experience at JD.com and Qunar; specializes in Java interview coaching and regularly shares hardcore technical content. Runs a video channel of the same name.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
