Why You Should Avoid IN and NOT IN in SQL Queries: Performance Issues and Common Pitfalls
This article explains why the SQL keywords IN and NOT IN often lead to poor performance and unexpected results, demonstrates the problems with large tables and NULL values, and recommends using EXISTS, NOT EXISTS, or JOIN alternatives for more reliable and efficient queries.
In many projects developers use the keywords IN and NOT IN for sub‑queries, but these constructs can cause serious performance degradation and logical errors.
1. Low efficiency – When two tables each contain about 1.5 million rows, a query such as select * from t1 where phone not in (select phone from t2) may run for many minutes because NOT IN cannot use the index on phone . Rewriting the same logic with NOT EXISTS reduces the execution time to around 20 seconds.
2. Easy to make mistakes – Using IN with a wrong column name (e.g., select id1 from test1 where id1 in (select id1 from test2) instead of id2 ) does not raise an error and returns incorrect results. Similarly, if the sub‑query contains a NULL value, a NOT IN condition will silently exclude rows that should be returned because NULL is not equal to any non‑null value.
To avoid these issues, the article suggests two main alternatives:
Use EXISTS / NOT EXISTS
select * from test1 where EXISTS (select * from test2 where id2 = id1) select * from test1 where NOT EXISTS (select * from test2 where id2 = id1)Use JOINs
select id1 from test1 INNER JOIN test2 ON id2 = id1 select id1 from test1 LEFT JOIN test2 ON id2 = id1 where id2 IS NULLThese approaches are index‑friendly and produce correct results even when NULL values are present.
The article also notes that IN and NOT IN are still acceptable when the set is small and known in advance, such as IN (0, 1, 2) .
At the end, a promotional note offers a free download of a programmer’s book collection by replying with a keyword, but the technical discussion remains the core content.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.