Understanding Why OceanBase Cannot Use HASH ANTI JOIN and Optimizing NOT IN Queries
This article analyzes a slow NOT IN subquery that OceanBase rewrites into a NESTED‑LOOP ANTI JOIN, explains why HASH ANTI JOIN is not chosen due to NULL‑sensitive semantics and missing NOT NULL constraints, compares the behavior with Oracle's Null‑Aware Anti Join, and provides practical rewrite and hint‑based optimization recommendations.
Recently a slow SQL was encountered on OceanBase 3.2.3.3 where a NOT IN subquery was rewritten by the optimizer into a NESTED‑LOOP ANTI JOIN , causing the driver table to be scanned fully and the query to take 16 seconds.
SELECT AGENT_ID, MAX(REL_AGENT_ID) FROM T_LDIM_AGENT_UPREL WHERE AGENT_ID NOT IN (SELECT AGENT_ID FROM T_LDIM_AGENT_UPREL WHERE valid_flg = '1') GROUP BY AGENT_ID;
The simplified execution plan shows a MERGE GROUP BY followed by a NESTED‑LOOP ANTI JOIN and a full table scan on the driver side.
Analysis of the table structure reveals that the join column AGENT_ID is indexed, but the optimizer still chooses a full scan because the column lacks a NOT NULL constraint, making the NOT IN predicate NULL‑sensitive.
Two direct reasons for the poor performance are identified: (1) the indexed column is used on the driver side without a NOT NULL guarantee, preventing index usage; (2) the optimizer prefers a NESTED‑LOOP ANTI JOIN over a HASH ANTI JOIN because it cannot guarantee correct results with NULL values.
Attempts to force a hash join with hints such as /*+ use_hash(A B) */ , /*+ USE_HASH(@"SEL$1" ("VIEW1"@"SEL$1" )) */ , and /*+ NO_USE_NL_AGGREGATION */ all failed; the plan shows an empty Used Hint section, indicating the hints were ignored.
Comparing with Oracle, the same query on Oracle produces a HASH RIGHT ANTI NA (Null‑Aware Anti Join) plan, which correctly handles NULLs. Oracle’s optimizer automatically adds the hidden parameter _OPTIMIZER_NULL_AWARE_ANTIJOIN (default true) to achieve this behavior.
To illustrate NULL handling, a small example shows that any NULL in the NOT IN list makes the whole predicate evaluate to NULL, while NOT EXISTS is not affected by NULLs. This explains why NOT IN is considered NULL‑sensitive.
Consequently, OceanBase 3.x cannot use a hash anti join because it lacks a true Null‑Aware implementation. The upcoming 4.x version will introduce the _OPTIMIZER_NULL_AWARE_ANTIJOIN parameter to align with Oracle.
Optimization suggestions are provided:
Add a NOT NULL constraint to AGENT_ID , allowing the optimizer to choose a hash anti join.
Rewrite the query using NOT EXISTS or a LEFT JOIN … IS NULL anti‑join pattern, which is not NULL‑sensitive.
Explicitly filter out NULLs in both the outer and inner queries with AGENT_ID IS NOT NULL conditions.
Rewritten example using NOT EXISTS :
SELECT AGENT_ID, MAX(REL_AGENT_ID) FROM T_LDIM_AGENT_UPREL t1 WHERE NOT EXISTS (SELECT 1 FROM T_LDIM_AGENT_UPREL t2 WHERE t1.AGENT_ID = t2.AGENT_ID AND valid_flg = '1') GROUP BY AGENT_ID;This version produces a HASH RIGHT ANTI JOIN plan on OceanBase and reduces execution time to about 50 ms.
Another rewrite adds explicit IS NOT NULL checks:
SELECT AGENT_ID, MAX(REL_AGENT_ID) FROM T_LDIM_AGENT_UPREL WHERE AGENT_ID NOT IN (SELECT AGENT_ID FROM T_LDIM_AGENT_UPREL WHERE valid_flg = '1' AND AGENT_ID IS NOT NULL) AND AGENT_ID IS NOT NULL GROUP BY AGENT_ID;The plan now shows a HASH RIGHT ANTI JOIN with similar performance.
In the Q&A section, three common questions are answered:
Why can HASH JOIN be used for NOT IN but not for arbitrary inequality conditions? Because NOT IN is semantically an equality check with a negated result, allowing a hash‑based implementation.
How does OceanBase handle NULLs in a NESTED‑LOOP ANTI JOIN ? It adds OR conditions checking for NULL on both sides, which yields correct results but still forces a full scan.
Why does adding /*+ no_rewrite */ improve performance only modestly? The hint forces a sub‑plan filter, but the underlying != ALL condition still prevents index usage, so the gain is limited.
The article concludes that understanding NULL‑sensitivity of anti‑join predicates and the availability of Null‑Aware Anti Join in the target database are key to selecting the right execution strategy.
References and further reading links are listed at the end of the original article.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.