Databases 20 min read

Mastering MySQL Join Algorithms: From Simple Loops to Hash Joins

This article explains how MySQL processes multi‑table joins, compares Simple Nested‑Loop, Block Nested‑Loop, Hash, and Index Nested‑Loop join algorithms, and demonstrates practical optimization techniques using indexes, join buffers, and query rewrites to dramatically improve performance on large datasets.

JD Cloud Developers

Dec 9, 2024

Mastering MySQL Join Algorithms: From Simple Loops to Hash Joins

Introduction

When optimizing MySQL queries that involve only a single table, adding appropriate indexes usually eliminates full‑table scans and sorting, delivering a dramatic performance boost. However, most production workloads involve multi‑table joins, which are often the biggest source of slow queries and high server resource consumption.

Preparation

All examples use MySQL 5.7.x as the primary version, with occasional references to MySQL 8.0 features.

Creating test tables:

-- Create two identical tables t1 and t2
create table t1(
  id int not null auto_increment,
  a int,
  b int,
  c int,
  primary key(id),
  key idx_a(a)
);
create table t2 like t1;

Generating test data with stored procedures:

delimiter //
create procedure t1_proc()
begin
  declare i int default 1;
  while i <= 3000 do
    if i % 3 = 0 then
      insert into t1(a,b,c) values(i,i,i);
    end if;
    set i = i + 1;
  end while;
end //

delimiter ;

delimiter //
create procedure t2_proc()
begin
  declare i int default 1;
  while i <= 200000 do
    if i % 2 = 0 then
      insert into t2(a,b,c) values(i,i,i);
    end if;
    set i = i + 1;
  end while;
end //

delimiter ;

call t1_proc();
call t2_proc();

drop procedure t1_proc;
drop procedure t2_proc;

MySQL JOIN Algorithms

1. Simple Nested‑Loop Join

Without any optimization, MySQL executes a nested loop: for each row in the driving table (t1) it scans the entire inner table (t2) and compares the join condition. With 1,000 rows in t1 and 100,000 rows in t2, this results in 100 million comparisons.

2. Block Nested‑Loop Join

MySQL stores the result set of the smaller (driving) table in a join buffer. The larger (driven) table is scanned once, and each row is compared against the buffered rows. The join buffer size (default 256 KB) determines how many rows are buffered before the buffer is flushed and refilled.

3. Hash Join

MySQL 8.0 introduces hash joins. The driving table’s join column values are hashed into an in‑memory hash table. The driven table then probes this hash table, drastically reducing the number of row‑by‑row comparisons. EXPLAIN shows Using join buffer (hash join).

Performance comparison (500 rows):

# MySQL 5.7 (Block Nested‑Loop)
... 500 rows in set (4.90 sec)

# MySQL 8.0 (Hash Join)
... 500 rows in set (0.02 sec)

4. Index Nested‑Loop Join

If the join column of the driven table has an index, MySQL can perform an index lookup for each row of the driving table, eliminating the need to scan the driven table entirely. This is analogous to using an index in a single‑table query.

Optimization Strategies

1. Initial Query

select *
from t1 join t2 on t1.b = t2.b
where t1.c in (6,12,18,24,30)
  and t2.c in (6,12,18,24,30);

EXPLAIN shows both tables are scanned (type=ALL), with the driven table (t2) using a join buffer.

2. Optimization 1 – Use Indexed Column

Replace the filter on t1.c with t1.a, which has an index.

select *
from t1 join t2 on t1.b = t2.b
where t1.a in (6,12,18,24,30)
  and t2.c in (6,12,18,24,30);

Now t1 uses range access via idx_a, reducing estimated rows from 1,000 to 5.

3. Optimization 2 – Index the Driven Table’s Join Column

Rewrite the query to join on t2.a, which is indexed.

select *
from t1 join t2 on t1.b = t2.a
where t1.a in (6,12,18,24,30)
  and t2.c in (6,12,18,24,30);

EXPLAIN shows t2 using range on idx_a, still with a join buffer but with far fewer rows scanned.

4. Optimization 3 – Index Nested‑Loop Join

When the driven table’s join column is indexed, MySQL automatically switches to Index Nested‑Loop Join.

select *
from t1 join t2 on t1.b = t2.a
where t1.a in (6,12,18,24,30)
  and t2.c in (6,12,18,24,30);

EXPLAIN shows type=ref for t2 using idx_a, confirming Index Nested‑Loop Join.

5. Optimization 4 – Both Columns Indexed

After adding an index on t2.c and adjusting data distribution, the optimizer still chooses Block Nested‑Loop Join, but the cost is low because both tables can be filtered to a few rows.

alter table t2 add index idx_c(c);
update t2 set a = a % 50;
alter table t2 engine=innodb;

EXPLAIN now shows type=range on t2 using idx_c with a small row estimate.

Conclusion

Understanding the different join algorithms and how MySQL chooses among them enables you to rewrite queries, add appropriate indexes, and adjust join buffers to achieve optimal performance, especially for large‑scale multi‑table joins.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

indexing SQL optimization Database Performance Join Algorithms

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.