Databases 9 min read

Understanding Semi‑join Materialization in MySQL Subquery Optimization

This article explains how MySQL transforms eligible subqueries into semi‑joins, describes the four semi‑join strategies, and details the Semi‑join Materialization approach—including Materialization‑scan and Materialization‑lookup—while showing how to identify the chosen strategy via EXPLAIN output.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Understanding Semi‑join Materialization in MySQL Subquery Optimization

This is the second article in a series on MySQL subquery optimization, focusing on the concrete implementation of Semi‑join strategies.

What is a Semi‑join

In regular joins duplicate rows may appear, while a subquery can return a distinct set. For example, joining country and city with a population filter produces duplicate china rows, whereas using IN (SELECT ...) returns a single row.

select country.* from country join city on country.code=city.country_code \
and population>20000000;
select * from country where code in \
(select country_code from city where population>20000000);

The optimizer can recognise that the IN clause needs only one value per group and replace the subquery with a Semi‑join, improving efficiency.

Semi‑join Constraints

The subquery must appear in a top‑level WHERE or ON clause as IN or =ANY .

It must be a single SELECT , not a UNION .

No GROUP BY or HAVING (except with materialization strategy).

No implicit grouping (no aggregate functions).

No ORDER BY … LIMIT .

The outer query must not use STRAIGHT_JOIN .

The total number of tables must be below the join‑table limit.

Semi‑join Implementation Strategies

Duplicate Weedout

FirstMatch

LooseScan

Materialize

These correspond to the optimizer_switch flags semijoin=ON and the individual strategy flags, all enabled by default.

EXPLAIN output shows which strategy is used: Extra containing Start temporary / End temporary indicates Duplicate Weedout; FirstMatch(tbl_name) indicates FirstMatch; LooseScan(m..n) indicates LooseScan; and a select_type of MATERIALIZED with a temporary table name signals the Materialize strategy.

Semi‑join Materialization

This strategy materialises the subquery result into a temporary table and then joins it. It has two variants:

Materialization‑scan

Materialization‑lookup

Example query (non‑correlated subquery) that finds European countries with cities having population > 7 million:

select * from Country
where Country.code IN (select City.Country
                       from City
                       where City.Population > 7*1000*1000)
      and Country.continent='Europe';

With Materialization‑scan the optimizer first builds the temporary table, then scans it fully when joining to Country . The EXPLAIN output shows two rows, the second with select_type=MATERIALIZED and the temporary table referenced as <subquery2> . The total rows scanned are 15 (subquery) + 15 (temporary) + 15 × 1 = 45.

Changing the subquery to return more rows switches to Materialization‑lookup, where the outer table drives the join and the temporary table is accessed via its primary key. The EXPLAIN output reflects this with eq_ref on the temporary table. The rows scanned become 238 (subquery) + 239 × 1 = 477.

Notes

In MariaDB, Semi‑join Materialization can still be used when the subquery contains GROUP BY , but in MySQL any GROUP BY disables all Semi‑join strategies, falling back to the generic Materialization optimization controlled by materialization=ON .

References

https://mariadb.com/kb/en/semi-join-materialization-strategy/

https://dev.mysql.com/doc/refman/5.7/en/semijoins.html

MySQLDatabase PerformanceSemi-JoinMaterializationSubquery Optimization
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.