How MySQL 8’s Hash Join Boosts Query Performance
Learn how MySQL 8’s hash join algorithm works by building an in‑memory hash table from the smaller table, probing the larger table for matches, and handling oversized tables through disk‑based chunk files, offering a more efficient alternative to traditional nested‑loop joins.
Previously MySQL only had the nested loop join algorithm; MySQL 8 introduced a new hash join algorithm that is more efficient.
Below is an overview of how it works.
Example SQL is illustrated in the following image.
Hash join operates in two phases: build and probe.
1. Build Phase
Select the smaller table (by size, not row count), for example countries, and compute the hash of the join field for each row, storing the results in an in‑memory hash table. hash(countries.country_id) After all rows are placed in the hash table, the build phase completes.
2. Probe Phase
For each row in the persons table, compute the hash of the join field and look it up in the in‑memory hash table; matching rows are sent to the client. hash(persons.country_id) This completes the join, scanning each table only once and providing constant‑time matching.
If the chosen small table cannot fit entirely into memory (controlled by join_buffer_size), MySQL overflows the excess data to disk.
3. Overflow to Disk
During the build phase, when memory is full, remaining rows are written to multiple chunk files sized to fit the available memory. The chunk for a row is determined by hashing the join field.
hash_2(countries.country_id)Thus, for large tables the build phase consists of two steps: writing to the in‑memory hash table and writing to chunk files.
In the probe phase, rows from persons are also written to chunk files. Corresponding probe chunk files are processed one‑by‑one: each chunk is loaded back into memory, hashed, and matched against the hash table.
Summary
The hash join algorithm selects a small table, builds an in‑memory hash table, then scans the other table to find matching rows. When tables are too large to fit in memory, the data is partitioned into chunk files, and each chunk is processed using the same build‑and‑probe logic.
References: https://mysqlserverteam.com/hash-join-in-mysql-8/
Further reading: MySQL order by, database indexes, read/write splitting, Canal data heterogeneity, MySQL transaction isolation, etc.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
