Databases 6 min read

Why IN and NOT IN Should Be Avoided in SQL: Performance Pitfalls and Safer Alternatives

This article explains why the SQL keywords IN and NOT IN often lead to poor performance and subtle bugs, demonstrates real‑world examples with large tables and NULL values, and recommends using EXISTS, NOT EXISTS, or JOINs as more reliable alternatives.

Architecture Digest

Mar 15, 2024

Why IN and NOT IN Should Be Avoided in SQL: Performance Pitfalls and Safer Alternatives

The article discusses why the commonly used SQL keywords IN and NOT IN should be avoided whenever possible because they can cause severe performance degradation and produce incorrect results.

1. Low efficiency : In a scenario where tables t1 and t2 each contain about 1.5 million rows, the query

select * from t1 where phone not in (select phone from t2)

runs for many minutes because NOT IN cannot use the index on phone. Rewriting the same logic with

NOT EXISTS

select * from t1 where not EXISTS (select phone from t2 where t1.phone = t2.phone)

reduces the execution time to around 20 seconds.

2. Easy to make mistakes or get wrong results : Using IN with a typo can silently produce wrong output. For example, after creating two tables:

create table test1 (id1 int);
create table test2 (id2 int);
insert into test1 (id1) values (1),(2),(3);
insert into test2 (id2) values (1),(2);

The correct query

select id1 from test1 where id1 in (select id2 from test2);

returns the expected rows 1 and 2. However, a typo such as

select id1 from test1where id1 in (select id1 from test2);

does not raise an error; it mistakenly compares id1 to itself and returns 1, 2, 3, which is misleading.

Another pitfall appears with NOT IN when the sub‑query contains NULL values. After inserting a NULL into test2: insert into test2 (id2) values (NULL); the query

select id1 from test1 where id1 not in (select id2 from test2);

returns an empty set because any comparison with NULL yields UNKNOWN, so the row with id1 = 3 is filtered out.

How to avoid these issues :

1. Replace IN / NOT IN with EXISTS or NOT EXISTS:

select * from test1 where EXISTS (select * from test2 where id2 = id1);
select * from test1 where NOT EXISTS (select * from test2 where id2 = id1);

2. Use JOIN operations instead:

select id1 from test1 INNER JOIN test2 ON id2 = id1;
select id1 from test1 LEFT JOIN test2 ON id2 = id1 where id2 IS NULL;

These approaches are safe and usually more performant. The article notes that IN can still be used for small, fixed constant sets such as IN (0,1,2).

In summary, avoiding IN and NOT IN in favor of EXISTS, NOT EXISTS, or appropriate JOIN s leads to clearer, faster, and more reliable SQL queries.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance SQL database Query Optimization NOT IN IN

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.