MySQL Subquery Optimization Strategies and Rewriting DELETE with JOIN
This article explains why a DELETE statement with a sub‑query can cause a full‑table scan in MySQL, describes the optimizer's semijoin, materialization and exists strategies for different sub‑query types, and shows how to rewrite the statement using a JOIN for better performance.
The article starts with a problematic DELETE statement that scans the whole table despite both the outer bizCustomerIncoming_id column and the inner cid column being indexed.
Subquery optimization strategies
For IN and =ANY subqueries MySQL may choose semijoin , materialization or exists . For NOT IN and <>ALL it can use materialization or exists . For derived tables the optimizer can apply derived_merge (merge the derived table into the outer query) or materialization (materialize the derived table as a temporary internal table).
Important limitation
UPDATE and DELETE statements cannot use the semijoin or materialization strategies; they fall back to the exists approach, which often leads to a full scan.
Illustrative example
CREATE TABLE `t2` (
`id` int(11) NOT NULL,
`a` int(11) DEFAULT NULL,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `a` (`a`)
) ENGINE=InnoDB; delimiter ;;
create procedure idata()
begin
declare i int;
set i=1;
while(i<=1000) do
insert into t2 values(i,i,i);
set i=i+1;
end while;
end;;
delimiter ;
call idata();
create table t1 like t2;
insert into t1 select * from t2 where id<=100;Running a query with IN (SELECT ...) is expected to be executed as a simple lookup, but MySQL rewrites it to an EXISTS correlated subquery, producing a DEPENDENT SUBQUERY plan and scanning many rows.
Semi‑join
When the outer table is large, executing the subquery for each outer row is inefficient. MySQL can transform the query into a join, or, if the semantics allow, use a semi‑join which eliminates duplicate rows while preserving the IN meaning. The optimizer may implement the semi‑join via a materialized temporary table and then probe the outer table.
select t1.*
from t1
join t2 on t1.a=t2.b and t2.id<10;Materialization
Materialization materializes the subquery result into a temporary table (with a primary key to deduplicate) and then joins it once, reducing the total scanned rows dramatically. The plan shows a MATERIALIZED select_type for the subquery.
+----+-------------+-------+-------+---------+------+------+-------------+
| id | select_type | table | type | key | ref | rows | Extra |
+----+-------------+-------+-------+---------+------+------+-------------+
| 1 | PRIMARY | t1 | ALL | NULL | NULL | 100 | Using where |
| 2 | SUBQUERY | t2 | range | PRIMARY | NULL | 9 | Using where |
+----+-------------+-------+-------+---------+------+------+-------------+The optimizer switches for these strategies are controlled by optimizer_switch flags semijoin and materialization .
Conclusion
Because DELETE cannot use semijoin or materialization, the original statement performs a full‑table scan; rewriting it as a JOIN eliminates the scan:
delete biz_customer_incoming_path
from biz_customer_incoming_path a
join biz_customer_incoming b
on a.bizCustomerIncoming_id = b.id
where b.cid = '7Ex46Dz22Fqq6iuPCLPlzQ';References: MySQL 5.7 Subquery Optimization documentation and the “High Performance MySQL” book, chapter 6.5.1.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.