Databases 8 min read

When to Use MySQL Hash Join vs. Nested Loop: A Practical Guide

This article explains MySQL's hash join algorithm, compares it with nested‑loop joins, shows how to read execution steps and EXPLAIN output, discusses extended hash‑join scenarios, and weighs the trade‑offs of performing multi‑table joins inside the database versus in application code.

Senior Tony

Jul 10, 2024

When to Use MySQL Hash Join vs. Nested Loop: A Practical Guide

Hash Join Overview

Since MySQL 8.0.20 the cache‑block nested‑loop join has been removed and replaced by hash join for many cases. When the driving table lacks an index, hash join usually outperforms the old nested‑loop approach because it builds an in‑memory hash table on the smaller table and probes it with the larger table.

MySQL Implementation Example

SELECT * FROM product p INNER JOIN `order` o ON p.id = o.product_id WHERE p.id IN (1,2);

Execution steps:

Build a hash table from the small table product on the join column id.

Scan the large table order, compute the hash of product_id, locate the corresponding bucket in the hash table, and match rows.

Combine matching rows into the final result set.

If the optimizer selects a hash join, the EXTRA column of EXPLAIN shows Using where; Using join buffer (hash join). When the hash table exceeds the join_buffer_size limit, MySQL spills to disk and the execution plan changes accordingly.

Extended Hash Join Scenarios

From MySQL 8.0.20 onward, hash join can be used for more than simple equi‑joins. Supported forms include:

Inner non‑equi‑join, e.g., SELECT * FROM t1 JOIN t2 ON t1.c1 < t2.c1 Semijoin, e.g., SELECT * FROM t1 WHERE t1.c1 IN (SELECT t2.c2 FROM t2) Antijoin, e.g.,

SELECT * FROM t2 WHERE NOT EXISTS (SELECT * FROM t1 WHERE t1.c1 = t2.c1)

Left outer join and right outer join

Application‑Level Join vs. Database Join

Implementing joins in application code typically involves reading both tables into memory and either using a nested‑loop merge or building a hashmap on the join column, which mirrors the database hash join.

Advantages of moving the join to the application layer:

Application servers can be horizontally scaled, reducing pressure on the database.

For simple merges, code reuse and maintenance may be easier.

Higher availability when the database is under heavy load, as the application can continue processing.

Disadvantages include increased network traffic, higher memory consumption, and the risk of saturating the database’s network interface, especially with large tables. In most realistic workloads, keeping multi‑table joins inside the database yields better performance and resource utilization.

Below are illustrative diagrams from the original article:

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL MySQL database optimization Hash Join Join Algorithms

Written by

Senior Tony

Former senior tech manager at Meituan, ex‑tech director at New Oriental, with experience at JD.com and Qunar; specializes in Java interview coaching and regularly shares hardcore technical content. Runs a video channel of the same name.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.