Why IN/NOT IN Can Destroy Query Performance and Return Wrong Results
The article explains how using IN and NOT IN in SQL can lead to full‑table scans, ignore indexes, produce incorrect results when columns mismatch or contain NULLs, and shows safer alternatives like EXISTS, NOT EXISTS, and JOIN with concrete code examples and benchmark timings.
IN and NOT IN are common SQL keywords, but the author shows why they should be avoided in many cases.
1. Efficiency loss
In a project two tables t1 and t2 each contain about 1.5 million rows (≈600 MB). The query
select * from t1 where phone not in (select phone from t2)runs for more than ten minutes even though both phone columns are indexed and have the same type. The reason is that NOT IN cannot use the index, causing a full scan.
Replacing it with NOT EXISTS reduces the execution time to about 20 seconds:
select * from t1 where NOT EXISTS (select phone from t2 where t1.phone = t2.phone)2. Easy to make mistakes or get wrong results
Using IN with mismatched columns does not raise an error; it simply returns unexpected rows. Example tables test1 (id1) and test2 (id2) are created and populated:
create table test1 (id1 int);
create table test2 (id2 int);
insert into test1 (id1) values (1),(2),(3);
insert into test2 (id2) values (1),(2);The intended query to get ids from test1 that exist in test2 is:
select id1 from test1 where id1 in (select id2 from test2);This works and returns 1, 2. However, a slip of the hand changes the inner column to id1:
select id1 from test1 where id1 in (select id1 from test2);Even though test2 has no column id1, the query runs and returns 1, 2, 3 because the engine treats the sub‑query as a constant list, hiding the mistake.
3. NULL handling pitfalls
When test2 contains a NULL value: insert into test2 (id2) values (NULL); the query to find ids in test1 that are not in test2 is:
select id1 from test1 where id1 not in (select id2 from test2);Because NULL is not equal to any non‑NULL value, the result set is empty – the expected row (3) disappears.
How to avoid the problems
1. Use EXISTS or NOT EXISTS instead of IN/NOT IN:
select * from test1 where EXISTS (select * from test2 where test2.id2 = test1.id1);
select * from test1 where NOT EXISTS (select * from test2 where test2.id2 = test1.id1);2. Use JOIN (inner or left) with proper ON conditions:
select id1 from test1 INNER JOIN test2 ON test2.id2 = test1.id1;
select id1 from test1 LEFT JOIN test2 ON test2.id2 = test1.id1 WHERE test2.id2 IS NULL;These approaches respect indexes and produce correct results even when NULLs are present.
Finally, the author notes that IN is still acceptable for small, known constant sets, e.g., IN (0,1,2).
Source: cnblogs.com/hydor/p/5391556.html
Java Web Project
Focused on Java backend technologies, trending internet tech, and the latest industry developments. The platform serves over 200,000 Java developers, inviting you to learn and exchange ideas together. Check the menu for Java learning resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
