Mastering Subquery Unnesting: Theory, Apply Operator, and Practical Rules
This article explains why correlated subqueries are a performance bottleneck, introduces the Apply (Correlated Join) operator, and presents a comprehensive set of transformation rules—including basic unnesting, Project/Filter push‑down, Aggregate handling, and Set‑operation rewrites—to convert subqueries into efficient join plans for modern SQL engines.
Introduction
Correlated subqueries are a classic pain point in SQL query optimization because they are executed like a nested‑loop, causing severe performance degradation on large data sets. The solution is to de‑correlate (or unnest) them into more efficient operators such as Semi‑Join.
Subquery Types
Subqueries can appear almost anywhere in a SQL statement. They are broadly classified into:
Correlated Subquery – depends on values from the outer query.
Non‑correlated Subquery – can be evaluated independently (not covered in this article).
Correlated subqueries are further divided by their result shape:
Scalar‑valued – returns a single row and column (e.g.,
SELECT c_custkey FROM CUSTOMER WHERE 1000000 < (SELECT SUM(o_totalprice) FROM ORDERS WHERE o_custkey = c_custkey)).
Existential Test – uses EXISTS and yields a Boolean (e.g.,
WHERE EXISTS (SELECT * FROM ORDERS WHERE o_custkey = c_custkey)).
Quantified Comparison – uses IN, SOME, ANY, etc. (e.g., WHERE c_nationkey <> ALL (SELECT s_nationkey FROM SUPPLIER)).
Original Execution Plan
Using a TPC‑H example (Query 1), the original plan shows the subquery attached to a Filter node. During execution the engine repeatedly invokes an Evaluator for each outer row, leading to an inefficient Executor‑Evaluator‑Executor cycle.
Apply Operator (Correlated Join)
The Apply operator receives two relation trees: the outer (left) input and an inner input that may reference parameters from the outer row. Its semantics are to evaluate the inner tree for each outer row and concatenate the results (Bag‑semantics UNION ALL).
Four concrete forms exist:
Cross Apply (A×A×) – basic form.
Left Outer Apply (ALOJ) – preserves outer rows with NULLs when the inner side is empty.
Semi Apply (A∃) – keeps the outer row only if the inner side produces at least one row.
Anti‑Semi Apply (A∄) – keeps the outer row only if the inner side is empty.
Basic Unnesting Rules
Rule (1) and Rule (2) state that if the right side of an Apply does not reference any outer parameters, the Apply can be replaced by a regular Join.
Project and Filter Unnesting
The second rule group pushes Apply deeper into the plan and lifts operators (Project, Filter) above it. This turns expressions that contain correlated parameters into ordinary columns, enabling further simplifications.
When Apply reaches the bottom of the tree, Rule (1) can turn it into a Join, completing the de‑correlation.
Aggregate Unnesting
Aggregates (GROUP BY) are handled by pushing Apply down and converting scalar aggregates into group aggregates. The transformation must ensure that the grouping key is the outer relation’s primary key.
Special care is required for functions where F(∅) ≠ F({NULL}). For example, COUNT(*) must be replaced by COUNT(non_null_column) or a Max1Row wrapper to preserve semantics.
Set‑Operation Unnesting
The final rule group deals with subqueries that contain UNION ALL, EXCEPT ALL, or inner joins. The guiding principle remains: push Apply down as far as possible and lift inner operators.
FAQ
Can any correlated subquery be un‑correlated? Yes, with a modest set of restrictions; theoretical proofs appear in references [1]–[3].
Differences between HyPer and SQL Server? HyPer defines more transformation rules (e.g., handling of distinct‑project “magic set” optimizations) and covers a broader set of cases.
Should these rules be applied in the rule‑based optimizer (RBO) or the cost‑based optimizer (CBO)? Generally they belong in a CBO, which can estimate the cost of the transformed plan and decide whether de‑correlation is beneficial.
References
Parameterized Queries and Nesting Equivalencies – C. Galindo‑Legaria
Orthogonal Optimization of Subqueries and Aggregation – C. Galindo‑Legaria, M. Joshi
Unnesting Arbitrary Queries – T. Neumann, A. Kemper
The Complete Story of Joins (in HyPer) – T. Neumann, V. Leis, A. Kemper
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
