Databases 9 min read

Understanding Hash Join in MySQL 8.0.18

This article explains how MySQL 8.0.18 introduces the hash join algorithm, describes its build and probe phases, disk‑overflow handling, usage instructions, optimizer switches, performance comparisons with block nested loops, and current limitations, providing code examples and practical guidance.

Aikesheng Open Source Community

Nov 19, 2019

Historically MySQL only used variations of nested‑loop joins, but starting with MySQL 8.0.18 a hash join implementation is available. This article introduces the concept, explains when it is used, and compares its performance with the older join algorithms.

What is a hash join? A hash join builds an in‑memory hash table on the join key of one input (the build input) and then probes that table with rows from the other input (the probe input). It is generally more efficient than nested‑loop joins when the build side fits in memory.

Example query used for illustration:

SELECT given_name, country_name
FROM persons JOIN countries ON persons.country_id = countries.country_id;

① Build phase – The server creates a hash table using the join attribute (e.g., countries.country_id) as the key and stores rows from the chosen build input (typically the smaller table) in memory.

② Probe phase – The server reads rows from the probe input ( persons) and looks up each persons.country_id in the hash table, emitting matching rows. Each input is scanned only once, and look‑ups are O(1).

If the build input exceeds the memory limit set by the join_buffer_size system variable, MySQL overflows the excess to disk, creating up to 128 block files. During the probe phase, rows are also written to block files using a different hash function, ensuring that matching rows end up in the same pair of block files.

After probing, the server loads each pair of block files back into memory one at a time, repeats the probe, and continues until all pairs are processed.

How to use it – Hash join is enabled by default; no special configuration is required. To verify that a query uses a hash join, run: EXPLAIN FORMAT=tree which will show an “Inner hash join” node if the optimizer chose this method.

MySQL prefers hash join when the join columns lack usable indexes; otherwise it may fall back to indexed nested‑loop joins.

Disabling hash join – You can turn it off with the optimizer switch: SET optimizer_switch="hash_join=off"; After disabling, MySQL reverts to block nested‑loop execution, making performance comparisons easier.

Performance data – Benchmarks (with all indexes disabled) show that hash join consistently outperforms block nested loops because it scans each input once and uses constant‑time look‑ups. However, current limitations include support only for inner hash joins (right, semi, and outer joins still use block nested loops) and the optimizer’s tendency to favor block nested loops.

Future work aims to remove these restrictions, but even with them, hash joins can significantly speed up suitable queries, especially when the join buffer is sized appropriately.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL mysql Hash Join Join Algorithms

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.