Why IN and NOT IN Can Destroy SQL Performance (And Safer Alternatives)
This article explains why the SQL keywords IN and NOT IN often lead to poor performance and incorrect results—especially with large tables or NULL values—and shows how using EXISTS, NOT EXISTS or JOIN can provide faster, more reliable queries.
IN and NOT IN are frequently used SQL keywords, but they should be avoided in many scenarios because they can cause severe performance problems and produce wrong results.
Why?
1. Low efficiency
In a project two tables t1 and t2 each contain about 1.5 million rows (≈600 MB). The following query using NOT IN runs for many minutes because the predicate cannot use the index on phone:
select * from t1 where phone not in (select phone from t2)After changing to NOT EXISTS the same query finishes in about 20 seconds, demonstrating a huge speed gain.
select * from t1 where not exists (select phone from t2 where t1.phone = t2.phone)2. Easy to make mistakes or get wrong results
Consider two tables test1(id1) and test2(id2):
create table test1 (id1 int);
create table test2 (id2 int);
insert into test1 (id1) values (1),(2),(3);
insert into test2 (id2) values (1),(2);Querying with IN works as expected:
select id1 from test1 where id1 in (select id2 from test2);If a typo changes id2 to id1 in the sub‑query, the statement still runs without error and returns all rows from test1:
select id1 from test1 where id1 in (select id1 from test2);Another pitfall appears when test2 contains a NULL value: insert into test2 (id2) values (NULL); Querying for rows in test1 that are not in test2 using NOT IN yields an empty set, because NULL does not compare equal to any non‑null value:
select id1 from test1 where id1 not in (select id2 from test2);Since NULL is not equal to any value, the row with id1 = 3 is also excluded, which is usually not the intended outcome.
How to avoid these problems?
1. Use EXISTS or NOT EXISTS instead of IN / NOT IN :
select * from test1 where exists (select * from test2 where test2.id2 = test1.id1);
select * from test1 where not exists (select * from test2 where test2.id2 = test1.id1);2. Use JOIN (inner or left) as an alternative:
select id1 from test1 inner join test2 on test2.id2 = test1.id1;
select id1 from test1 left join test2 on test2.id2 = test1.id1 where test2.id2 is null;These approaches are safe, index‑friendly, and usually much faster.
Note: IN can still be used when the set is small and constant, e.g., IN (0,1,2).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
