Understanding IN vs EXISTS and NOT IN vs NOT EXISTS in SQL
This article explains the performance differences between IN, EXISTS, NOT IN, and NOT EXISTS in SQL queries, provides guidance on when to use each construct based on table sizes and indexes, and demonstrates the concepts with practical code examples.
1. IN and EXISTS
IN creates a hash join between the outer and inner tables, while EXISTS performs a loop over the outer table and queries the inner table for each iteration; the common belief that EXISTS is always faster than IN is inaccurate.
If the two tables are of comparable size, the performance difference between IN and EXISTS is minimal; if one table is much larger, use EXISTS when the subquery table is large and IN when the subquery table is small.
Example with Table A (small) and Table B (large):
select * from A where cc in (select cc from B) -- slower, uses index on A.cc
select * from A where exists (select 1 from B where B.cc = A.cc) -- faster, uses index on B.ccConversely, when Table B is the outer table:
select * from B where cc in (select cc from A) -- faster, uses index on B.cc
select * from B where exists (select 1 from A where A.cc = B.cc) -- slower, uses index on A.cc2. NOT IN and NOT EXISTS
NOT IN is not logically equivalent to NOT EXISTS; misuse of NOT IN can introduce serious bugs. The following example demonstrates the issue:
create table #t1(c1 int,c2 int);
create table #t2(c1 int,c2 int);
insert into #t1 values(1,2);
insert into #t1 values(1,3);
insert into #t2 values(1,2);
insert into #t2 values(1,null);
select * from #t1 where c2 not in (select c2 from #t2); -- returns no rows
select * from #t1 where not exists (select 1 from #t2 where #t2.c2 = #t1.c2); -- returns rows 1 3The NOT IN query yields an unexpected empty result set because the subquery returns a NULL value, causing the whole predicate to evaluate to UNKNOWN. NOT EXISTS, on the other hand, can still use indexes and typically performs better.
If the subquery returns any NULL, NOT IN will return no rows; in such cases, NOT EXISTS is preferable. When the subquery column is defined as NOT NULL, NOT IN can be used safely, but EXISTS generally provides better index utilization.
3. Difference between IN and =
select name from student where name in ('zhang','wang','zhao');
select name from student where name='zhang' or name='wang' or name='zhao';Both queries produce the same result set.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.