Why IN/NOT IN Slow Down Queries and How EXISTS, NOT EXISTS, or JOIN Can Fix Them
The article explains why using IN and NOT IN in SQL queries can lead to poor performance and incorrect results, especially with large tables or NULL values, and demonstrates how replacing them with EXISTS, NOT EXISTS, or appropriate JOINs yields faster, reliable outcomes.
Problem: IN/NOT IN Are Inefficient and Error‑Prone
In many projects, developers write queries like SELECT * FROM t1 WHERE phone NOT IN (SELECT phone FROM t2). Even when both t1 and t2 have indexes and contain around 1.5 million rows, the query can take minutes because NOT IN cannot use the index.
Replacing the same logic with NOT EXISTS reduces execution time dramatically (e.g., from >10 minutes to ~20 seconds).
Example 1: Wrong Results with IN
Consider two small tables:
CREATE TABLE test1 (id1 INT);
CREATE TABLE test2 (id2 INT);
INSERT INTO test1 (id1) VALUES (1),(2),(3);
INSERT INTO test2 (id2) VALUES (1),(2);Querying IDs that exist in test2 using IN works as expected:
SELECT id1 FROM test1 WHERE id1 IN (SELECT id2 FROM test2);But a typo—using id1 instead of id2 in the subquery—produces the wrong result set without any error, returning 1 2 3 even though the column does not exist in test2.
Example 2: NOT IN With NULL Returns Unexpected Blank Set
Adding a NULL value to test2:
INSERT INTO test2 (id2) VALUES (NULL);Now the query
SELECT id1 FROM test1 WHERE id1 NOT IN (SELECT id2 FROM test2)returns an empty set, because NULL is not equal to any value, so the condition fails for every row.
Tip: Avoid allowing NULLs in columns used for set comparisons.
Solution 1: Use EXISTS / NOT EXISTS
Replace IN with EXISTS and NOT IN with NOT EXISTS to let the optimizer use indexes efficiently:
SELECT * FROM test1 WHERE EXISTS (SELECT * FROM test2 WHERE test2.id2 = test1.id1);
SELECT * FROM test1 WHERE NOT EXISTS (SELECT * FROM test2 WHERE test2.id2 = test1.id1);Solution 2: Use JOINs
For the same logic you can use inner or left joins:
-- Inner join (equivalent to EXISTS)
SELECT t1.id1 FROM test1 t1 INNER JOIN test2 t2 ON t1.id1 = t2.id2;
-- Left join to find rows not present in test2
SELECT t1.id1 FROM test1 t1 LEFT JOIN test2 t2 ON t1.id1 = t2.id2 WHERE t2.id2 IS NULL;These alternatives avoid the pitfalls of IN / NOT IN and provide predictable performance.
Note: IN is still acceptable when the set is small and constant, e.g., WHERE col IN (0,1,2).
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
