MySQL Join Optimization: Understanding BNL vs NLJ and Index Issues with Character Set Mismatches
This article analyzes a slow MySQL LEFT JOIN query, explains why the optimizer chose the inefficient Block Nested Loop algorithm instead of Index Nested Loop, shows how character‑set and collation differences cause index loss, and demonstrates how converting to INNER JOIN or fixing indexes restores high performance.
While studying Ding Qi's "MySQL Practical 45 Lectures" on join optimization, the author encountered a problematic LEFT JOIN query and decided to document the analysis as a technical note.
Problem SQL description
explain SELECT t1.stru_id AS struId, ... FROM cams_stru_info t1 LEFT JOIN cams_mainframerel t2 ON t1.stru_id = t2.stru_id WHERE t1.stru_state="1";
The query uses LEFT JOIN on two tables where stru_id is the primary key in both, yet the execution plan shows very poor performance compared with an equivalent INNER JOIN.
Identified issues
Large driving table (t1 ≈ 110 k rows) and small driven table (t2 ≈ 19 k rows) cause a full scan of the driver.
The filter stru_state = 1 has low cardinality, so an index would not help.
Although the driven table has an index on the join column, the optimizer does not use it.
The optimizer selects the Block Nested Loop (BNL) algorithm, resulting in roughly 20 billion row comparisons.
Join algorithms
In an Index Nested‑Loop Join (NLJ), the driver table rows are read one by one and each row probes the driven table via its index, leading to a number of lookups equal to the driver row count (e.g., 10 k lookups for 10 k rows).
When the driven table cannot use an index, the optimizer may fall back to BNL: it loads the entire driver into a join buffer and then scans the driven table for each buffer entry, which can cause driver_rows + driver_rows full scans and driver_rows × driver_rows comparisons.
MySQL 8.0 also supports hash join, which can replace BNL when a suitable hash table can be built.
Analysis mistake and discovery
The author initially tried to disable BNL with the hint /*+ NO_BNL() */ , but the plan still showed a full scan of the driven table. The root cause turned out to be a mismatch in character set and collation between the join columns of the two tables.
Reproduction test
Two tables were created:
CREATE TABLE t3 (id INT AUTO_INCREMENT PRIMARY KEY, a CHAR(50) CHARACTER SET utf8, KEY idx_a (a));
CREATE TABLE t4 (id INT AUTO_INCREMENT PRIMARY KEY, b CHAR(50) CHARACTER SET latin1, KEY idx_b (b));
Four queries were executed:
SQL1: SELECT * FROM t3 STRAIGHT_JOIN t4 ON t3.a = t4.b;
SQL2: SELECT * FROM t4 STRAIGHT_JOIN t3 ON t3.a = t4.b;
SQL3: SELECT * FROM t3 LEFT JOIN t4 ON t3.a = t4.b;
SQL4: SELECT * FROM t3 JOIN t4 ON t3.a = t4.b;
SQL1 and SQL3 (driving table t3) both suffered index loss and used BNL, while SQL2 and SQL4 (driving table t4) used NLJ because the driven table’s charset was larger, allowing the index to be applied.
The same behavior was observed for utf8 vs utf8mb4 joins.
Conclusion
The choice between NLJ and BNL depends on whether the driven table can actually use an index, not merely on the existence of one.
Character‑set or collation mismatches on join columns can cause index loss, especially in LEFT JOINs; keeping charset and collation consistent across tables is essential.
Converting BNL to NLJ by ensuring a usable index on the driven table yields the best performance, and MySQL 8.0.18+ further improves this with hash join support.
References
https://time.geekbang.org/column/article/79700 https://time.geekbang.org/column/article/80147 https://time.geekbang.org/column/article/82865
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.