Databases 17 min read

Why a Regex IN Subquery Breaks Oracle Index Scan – Lessons from a Failed Optimization

The article recounts a real‑world Oracle SQL case where a REGEXP_SUBSTR‑based IN subquery caused the optimizer to abandon an index range scan, explores why predicate pushdown fails with CONNECT BY recursion, documents attempted hints, rewrites, and the ultimate lessons for SQL performance tuning.

dbaplus Community
dbaplus Community
dbaplus Community
Why a Regex IN Subquery Breaks Oracle Index Scan – Lessons from a Failed Optimization

Background

The author, a decade‑long SQL specialist, was invited by the community to troubleshoot a question titled “Oracle regex as condition changes execution plan”. The problematic query filters rows using an IN subquery that expands a comma‑separated string via REGEXP_SUBSTR and CONNECT BY.

Problem Statement

When the filter value is a literal constant (e.g., AND TA.CACSS_C IN ('10001')), the execution plan shows TAB_T2 accessed by INDEX RANGE SCAN . Replacing the constant with the regex‑based subquery makes the plan switch to a FULL TABLE SCAN , and the optimizer no longer performs predicate pushdown (PUSH_PRED).

Investigation

Displayed the original SQL and its execution plan.

Noted that REGEXP_SUBSTR operates on a constant string, not on a table column, so the optimizer should theoretically still be able to push predicates.

Observed that the index INDEX_N1 is on the join column HDID_C , not on the filter column CACSS_C .

Collected 10046 and 10053 trace events; compared plans with and without predicate pushdown.

Discovered that the CONNECT BY recursive query inside the IN subquery prevents predicate pushdown.

Attempts to Fix

Various hints were tried, including /*+ push_pred(r) */, /*+ index(l, index_n1) */, and /*+ use_nl(ta,l) */, but none altered the plan.

Rewriting the query to replace the regex IN subquery with an EXISTS clause restored index usage:

SELECT TA.*
FROM TAB_T1 TA
WHERE TA.ATPY_C = 1
  AND TA.CACSS_C IN (
      SELECT REGEXP_SUBSTR(CONT, '[^,]+', 1, LEVEL)
      FROM (SELECT '10001' CONT FROM DUAL)
      CONNECT BY LEVEL <= REGEXP_COUNT(CONT, '[^,]+'))
  AND EXISTS (SELECT 1 FROM ESF.TAB_T2 L WHERE L.HDID_C = TA.APLTNN_C);

Why the Rewrite Still Fell Short

Further analysis showed that the ROW_NUMBER analytic function processes the large TAB_T2 table. After removing the subquery, the analytic now works on the join result of TAB_T1 and TAB_T2 , potentially generating a huge intermediate row set if the join is many‑to‑many. Moreover, the column APLTNN_C lacks a unique constraint, making the rewrite unsafe for certain data distributions.

Final Takeaways

Predicate pushdown can be blocked by CONNECT BY / START WITH constructs.

Using a regex‑based IN subquery for string‑to‑row conversion is convenient but may cause the optimizer to forgo index scans.

SQL hints are not guaranteed to work; sometimes a logical rewrite or moving the string split to the application layer is required.

Understanding data volume, index coverage, and column uniqueness is essential before applying rewrites.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLperformance tuningOracleregexPredicate PushdownIN Subquery
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.