Databases 5 min read

How MySQL 8’s Hash Join Boosts Query Performance

Learn how MySQL 8’s hash join algorithm works by building an in‑memory hash table from the smaller table, probing the larger table for matches, and handling oversized tables through disk‑based chunk files, offering a more efficient alternative to traditional nested‑loop joins.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How MySQL 8’s Hash Join Boosts Query Performance

Previously MySQL only had the nested loop join algorithm; MySQL 8 introduced a new hash join algorithm that is more efficient.

Below is an overview of how it works.

Example SQL is illustrated in the following image.

Hash join operates in two phases: build and probe.

1. Build Phase

Select the smaller table (by size, not row count), for example countries, and compute the hash of the join field for each row, storing the results in an in‑memory hash table. hash(countries.country_id) After all rows are placed in the hash table, the build phase completes.

2. Probe Phase

For each row in the persons table, compute the hash of the join field and look it up in the in‑memory hash table; matching rows are sent to the client. hash(persons.country_id) This completes the join, scanning each table only once and providing constant‑time matching.

If the chosen small table cannot fit entirely into memory (controlled by join_buffer_size), MySQL overflows the excess data to disk.

3. Overflow to Disk

During the build phase, when memory is full, remaining rows are written to multiple chunk files sized to fit the available memory. The chunk for a row is determined by hashing the join field.

hash_2(countries.country_id)

Thus, for large tables the build phase consists of two steps: writing to the in‑memory hash table and writing to chunk files.

In the probe phase, rows from persons are also written to chunk files. Corresponding probe chunk files are processed one‑by‑one: each chunk is loaded back into memory, hashed, and matched against the hash table.

Summary

The hash join algorithm selects a small table, builds an in‑memory hash table, then scans the other table to find matching rows. When tables are too large to fit in memory, the data is partitioned into chunk files, and each chunk is processed using the same build‑and‑probe logic.

References: https://mysqlserverteam.com/hash-join-in-mysql-8/

Further reading: MySQL order by, database indexes, read/write splitting, Canal data heterogeneity, MySQL transaction isolation, etc.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLmysqlHash JoinJoin Algorithms
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.