Why MySQL IN Subqueries Can Be So Slow and How to Fix Them
This article examines why a MySQL query that uses an IN subquery on a massive users table becomes extremely slow, analyzes the execution plan revealing materialized temporary tables and semi‑join optimization, and demonstrates how disabling the optimizer or rewriting the query restores index usage and dramatically improves performance.
1. Case Introduction
A system needs to push messages (promotions, card offers, special products) to a large number of users. The users are stored in a single massive table (tens of millions) split into core user data (users) and extended info (users_extent_info). The query uses an IN subquery to filter users, which becomes very slow.
SELECT id, name FROM users WHERE id IN (
SELECT user_id FROM users_extent_info
# recent login users WHERE latest_login_time < xx
);The query first counts matching rows, then reads data in batches, but on a table with millions of rows the count itself takes dozens of seconds.
SELECT COUNT(id) FROM users WHERE id IN (
SELECT user_id FROM users_extent_info
WHERE latest_login_time < xxxxx
);The execution plan shows a MATERIALIZED subquery that creates a temporary table, and a full table scan on users with a join buffer, leading to many rows scanned and low filtered percentage.
EXPLAIN SELECT COUNT(id) FROM users WHERE id IN (
SELECT user_id FROM users_extent_info
WHERE latest_login_time < xx
);2. Why Is It Slow?
The subquery materializes 4561 rows into a temporary table, then the outer query scans the entire users table, joining each row with the temporary table, effectively performing a full scan and a semi‑join, which bypasses indexes.
3. Show Warnings
MySQL rewrites the IN clause as a semi‑join, causing a full scan of users.
/* select#1 */ select count(d2.users.user_id) AS COUNT(users.user_id) from d2.users semi join ...4. Experiment
Disabling the semi‑join optimizer with SET optimizer_switch='semijoin=off'; changes the plan: the subquery uses a range scan on the idx_login_time index, and the outer query uses the PRIMARY key, reducing execution time to about 100 ms.
Subquery scans index range, returns 4561 rows.
Primary query uses id primary key.
Batch processing now runs tens of times faster.
Another workaround is to rewrite the WHERE clause with an always‑false OR condition, preventing the optimizer from applying the semi‑join while keeping the same business logic.
SELECT COUNT(id) FROM users WHERE (
id IN (SELECT user_id FROM users_extent_info WHERE latest_login_time < xxxxx)
OR id IN (SELECT user_id FROM users_extent_info WHERE latest_login_time < -1)
);The key takeaway is to understand execution plans, avoid full table scans, and ensure indexes are used.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
