Why MySQL IN Subqueries Turn Into Slow Full Scans—and How to Fix Them
A large‑scale user‑push system suffers from minutes‑long MySQL queries because the IN subquery materializes a temporary table and triggers a semi‑join that forces full table scans, but disabling the semi‑join optimizer or rewriting the query restores sub‑second performance.
Case Introduction
A system needs to push promotional messages, card offers, and special‑price items to a massive user base. Daily active users reach a million, total registered users are in the tens of millions, and the user data resides in a single large table.
Original Query and Count
The operation filters users by recent login time using an IN subquery, then counts the matching rows before batch processing:
SELECT id, name FROM users WHERE id IN (
SELECT user_id FROM users_extent_info
WHERE latest_login_time < xx
);
SELECT COUNT(id) FROM users WHERE id IN (
SELECT user_id FROM users_extent_info WHERE latest_login_time < xxxxx
);Table Layout
users – stores core user data (id, name, nickname, phone).
users_extent_info – stores extended info (address, interests, last login time).
The IN clause often returns thousands to hundreds of thousands of rows, so the database first runs a COUNT query that can take dozens of seconds on a table with tens of millions of rows.
Why the Query Is Slow
The subquery is materialized into a temporary table (MATERIALIZED step). MySQL then performs a full scan of the users table and, for each row, joins against the temporary table, effectively scanning the temporary table repeatedly. The execution plan shows:
| id | select_type | table | type | key | rows | filtered | Extra |
|----|-------------|---------------------|--------|--------------|-------|----------|------------------------------------------|
| 1 | SIMPLE | users | ALL | NULL | 49651 | 10.00 | Using where; Using join buffer (BNL) |
| 2 | MATERIALIZED| users_extent_info | range | idx_login_time| 4561 | 100.00 | NULL |The MATERIALIZED step creates a disk‑based temporary table with 4561 rows, and the subsequent full scan of users joins each user row against this temporary table, causing the high cost.
Show Warnings and Semi‑Join Insight
Running SHOW WARNINGS reveals that MySQL rewrites the IN subquery as a semi‑join:
/* select#1 */ select count(d2.users.user_id) AS COUNT(users.user_id)
from d2.users users semi join xxxxxxThe semi‑join forces a full scan of the materialized temporary table for every users row, eliminating the benefit of the index.
Experiment: Disable Semi‑Join Optimization
Setting the optimizer switch disables the semi‑join:
SET optimizer_switch='semijoin=off';After disabling, EXPLAIN shows a range scan on users_extent_info and a primary‑key lookup on users. Executing the query now finishes in about 100 ms instead of dozens of seconds.
Alternative Rewrite
Another way to avoid the semi‑join is to add a harmless OR condition that prevents MySQL from applying the optimization while keeping the business logic unchanged:
SELECT COUNT(id) FROM users WHERE (
id IN (SELECT user_id FROM users_extent_info WHERE latest_login_time < xxxxx)
OR id IN (SELECT user_id FROM users_extent_info WHERE latest_login_time < -1)
);The second predicate can never be true, but it forces MySQL to use a regular subquery plan with index lookups, eliminating the costly materialized join.
Key Takeaways
Inspect the execution plan to understand why a query is slow.
Materialized subqueries and semi‑join optimizations can cause full table scans on large tables.
Disabling the semi‑join optimizer or rewriting the query can restore index usage and dramatically improve performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
