Understanding and Using Hash Join in MySQL 8.0
This article explains the concept of Hash Join in MySQL 8.0, compares it with Nested Loop joins, shows how to enable or force it with server variables or hints, and presents performance benchmarks that demonstrate its speed advantages on large datasets.
While browsing MySQL documentation the author discovered that MySQL 8.0.18 GA added support for Hash Join, a join algorithm that can be much faster than Nested Loop for large tables.
MySQL now offers three join methods—Nested Loops, Hash Join, and Sort‑Merge Join. Hash Join is also used by Spark and Flink for SQL joins.
Hash Join works by building a hash table on the join key of the smaller table in memory (using disk space if necessary) and then probing it with rows from the larger table, supporting inner joins and extendable to outer, semi, and anti joins.
Using EXPLAIN FORMAT=TREE you can see the execution plan; for example:
EXPLAIN: -> Inner hash join (t2.c1 = t1.c1) (cost=0.70 rows=1)
-> Table scan on t2 (cost=0.35 rows=1)
-> Hash
-> Table scan on t1 (cost=0.35 rows=1)The plan shows the keyword Inner hash join , confirming that MySQL chose Hash Join.
Hash Join is only applied when the ON condition is an equality join; non‑equi joins fall back to Nested Loop.
You can control Hash Join globally with the system variable hash_join=ON or locally with hints such as /*+ HASH_JOIN(t1, t2) */ or /*+ NO_HASH_JOIN(t1, t2) */.
Performance tests were conducted on three tables (t1, t2, t3) each containing 1,000,000 rows. The following DDL creates the tables:
CREATE TABLE t1 (c1 INT, c2 INT);
CREATE TABLE t2 (c1 INT, c2 INT);
CREATE TABLE t3 (c1 INT, c2 INT);A simple join query: SELECT * FROM t1 JOIN t2 ON t1.c1 = t2.c1; Running EXPLAIN ANALYZE with Hash Join took about 12.98 seconds, while forcing a Block Nested Loop increased the runtime to dozens of minutes and saturated the CPU.
After adding indexes on the join columns, the Nested Loop execution time dropped to around 19.56 seconds, still slower than the Hash Join.
Additional benchmarks on Oracle 12c (1.282 s), PostgreSQL 11.5 (6.234 s), and SQL Server 2017 (5.207 s) without indexes also show the speed advantage of Hash Join.
Overall, the article demonstrates that Hash Join is a powerful and often default choice in MySQL for equality joins, and it can be tuned or forced as needed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
