Databases 21 min read

Mastering MySQL Join Algorithms: From Simple Loops to Hash Joins

This article explores MySQL's join processing methods—Simple Nested‑Loop, Block Nested‑Loop, Hash, and Index Nested‑Loop joins—demonstrates how to create test tables, analyze execution plans, and apply practical optimizations such as indexing and query rewriting to dramatically improve multi‑table query performance.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Mastering MySQL Join Algorithms: From Simple Loops to Hash Joins

1. Introduction

When optimizing single‑table queries in MySQL, adding appropriate indexes often eliminates full table scans and dramatically improves performance. However, most production workloads involve multi‑table joins, which are the primary source of slow queries and high resource consumption. This article examines how to optimize such joins.

2. Test Setup

All examples use MySQL 5.7.x as the baseline, with occasional references to MySQL 8.0 for features not available in 5.7.

-- Create two identical tables t1 and t2
CREATE TABLE t1(
  id INT NOT NULL AUTO_INCREMENT,
  a INT,
  b INT,
  c INT,
  PRIMARY KEY(id),
  KEY idx_a(a)
);
CREATE TABLE t2 LIKE t1;

Populate the tables with test data using stored procedures:

DELIMITER //
CREATE PROCEDURE t1_proc()
BEGIN
  DECLARE i INT DEFAULT 1;
  WHILE i <= 3000 DO
    IF i % 3 = 0 THEN
      INSERT INTO t1(a,b,c) VALUES(i,i,i);
    END IF;
    SET i = i + 1;
  END WHILE;
END//
DELIMITER ;

CALL t1_proc();

DELIMITER //
CREATE PROCEDURE t2_proc()
BEGIN
  DECLARE i INT DEFAULT 1;
  WHILE i <= 200000 DO
    IF i % 2 = 0 THEN
      INSERT INTO t2(a,b,c) VALUES(i,i,i);
    END IF;
    SET i = i + 1;
  END WHILE;
END//
DELIMITER ;

CALL t2_proc();

DROP PROCEDURE t1_proc;
DROP PROCEDURE t2_proc;

Sample data (first few rows):

+----+------+------+------+
| id | a    | b    | c    |
+----+------+------+------+
| 1  | 3    | 3    | 3    |
| 2  | 6    | 6    | 6    |
| 3  | 9    | 9    | 9    |
| 4  | 12   | 12   | 12   |
| 5  | 15   | 15   | 15   |
+----+------+------+------+
5 rows in set (0.00 sec)

3. MySQL Join Algorithms

3.1 Simple Nested‑Loop Join

Without any optimization, MySQL executes a nested loop: for each row in the outer table (t1, 1000 rows) it scans the entire inner table (t2, 100 000 rows), resulting in about 100 million row comparisons. SELECT * FROM t1 JOIN t2 ON t1.b = t2.b; Execution plan (5.7):

id: 1  select_type: SIMPLE  table: t1  type: ALL  rows: 1000
id: 2  select_type: SIMPLE  table: t2  type: ALL  rows: 100256  Extra: Using where; Using join buffer (Block Nested Loop)
Simple Nested Loop
Simple Nested Loop

3.2 Block Nested‑Loop Join

MySQL stores the outer table rows in a join buffer (default 256 KB). When the buffer fills, it scans the inner table once and compares each inner row against all buffered outer rows. This reduces the number of full scans of the inner table but still requires many row comparisons.

-- Same query as above, but MySQL chooses Block Nested Loop
EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.b = t2.b\G
Block Nested Loop
Block Nested Loop

3.3 Hash Join (MySQL 8.0+)

MySQL 8.0 can build a hash table on the join column of the outer table, allowing constant‑time lookups for each inner row. This dramatically reduces the number of comparisons.

EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.b = t2.b\G
Hash Join
Hash Join

3.4 Index Nested‑Loop Join

If the inner table has an index on the join column, MySQL can perform an index lookup for each outer row, eliminating the need for a full scan of the inner table.

EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.b = t2.a\G
Index Nested Loop
Index Nested Loop

4. Optimization Strategies

4.1 Use Indexes on Filter Columns

Original query filtered t1.c and t2.c without indexes, causing full scans. Replacing the filter with the indexed column t1.a reduces the estimated rows from 1000 to 5.

SELECT *
FROM t1 JOIN t2 ON t1.b = t2.b
WHERE t1.a IN (6,12,18,24,30)
  AND t2.c IN (6,12,18,24,30);

New execution plan shows type: range on t1 using idx_a, while t2 still uses Block Nested Loop because its join column lacks an index.

4.2 Index the Join Column of the Inner Table

Adding an index on t2.b (or using t2.a which already has an index) enables an Index Nested‑Loop Join, turning the inner scan into an index lookup.

EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.b = t2.a\G

Result: type: ref on t2 using idx_a, rows per lookup = 1.

4.3 Combine Indexes on Join and Filter Columns

If both the join column and the filter column are indexed, MySQL chooses the most selective index. In the example, adding idx_c on t2.c and adjusting data distribution leads MySQL to use idx_c with a Block Nested‑Loop Join, but the overall cost remains low because the filtered set is tiny.

ALTER TABLE t2 ADD INDEX idx_c(c);
UPDATE t2 SET a = a % 50;  -- make each a value appear ~4000 times
EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.b = t2.a
WHERE t1.a IN (6,12,18,24,30)
  AND t2.c IN (6,12,18,24,30)\G

Plan shows key: idx_c with rows: 5 and

Extra: Using index condition; Using where; Using join buffer (Block Nested Loop)

.

4.4 General Guidance for Multi‑Table Joins

For queries joining more than two tables, treat the result of the first two tables as a new driver table and apply the same two‑table analysis to subsequent joins. The optimizer will choose the cheapest algorithm based on available indexes and row estimates.

5. Conclusion

The choice between Block Nested‑Loop, Hash, and Index Nested‑Loop joins depends on index availability and data distribution. By creating appropriate indexes on join and filter columns, rewriting queries to use indexed columns, and understanding MySQL’s execution plans, you can turn slow multi‑table queries into fast, scalable operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

indexingmysqlSQL OptimizationDatabase PerformanceJoin Algorithms
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.