Databases 18 min read

Mastering Subquery Unnesting: Theory, Apply Operator, and Practical Rules

This article explains why correlated subqueries are a performance bottleneck, introduces the Apply (Correlated Join) operator, and presents a comprehensive set of transformation rules—including basic unnesting, Project/Filter push‑down, Aggregate handling, and Set‑operation rewrites—to convert subqueries into efficient join plans for modern SQL engines.

ITPUB
ITPUB
ITPUB
Mastering Subquery Unnesting: Theory, Apply Operator, and Practical Rules

Introduction

Correlated subqueries are a classic pain point in SQL query optimization because they are executed like a nested‑loop, causing severe performance degradation on large data sets. The solution is to de‑correlate (or unnest) them into more efficient operators such as Semi‑Join.

Subquery Types

Subqueries can appear almost anywhere in a SQL statement. They are broadly classified into:

Correlated Subquery – depends on values from the outer query.

Non‑correlated Subquery – can be evaluated independently (not covered in this article).

Correlated subqueries are further divided by their result shape:

Scalar‑valued – returns a single row and column (e.g.,

SELECT c_custkey FROM CUSTOMER WHERE 1000000 < (SELECT SUM(o_totalprice) FROM ORDERS WHERE o_custkey = c_custkey)

).

Existential Test – uses EXISTS and yields a Boolean (e.g.,

WHERE EXISTS (SELECT * FROM ORDERS WHERE o_custkey = c_custkey)

).

Quantified Comparison – uses IN, SOME, ANY, etc. (e.g., WHERE c_nationkey <> ALL (SELECT s_nationkey FROM SUPPLIER)).

Original Execution Plan

Using a TPC‑H example (Query 1), the original plan shows the subquery attached to a Filter node. During execution the engine repeatedly invokes an Evaluator for each outer row, leading to an inefficient Executor‑Evaluator‑Executor cycle.

Original execution plan with subquery under Filter
Original execution plan with subquery under Filter

Apply Operator (Correlated Join)

The Apply operator receives two relation trees: the outer (left) input and an inner input that may reference parameters from the outer row. Its semantics are to evaluate the inner tree for each outer row and concatenate the results (Bag‑semantics UNION ALL).

Apply operator diagram
Apply operator diagram

Four concrete forms exist:

Cross Apply (A×A×) – basic form.

Left Outer Apply (ALOJ) – preserves outer rows with NULLs when the inner side is empty.

Semi Apply (A∃) – keeps the outer row only if the inner side produces at least one row.

Anti‑Semi Apply (A∄) – keeps the outer row only if the inner side is empty.

Basic Unnesting Rules

Rule (1) and Rule (2) state that if the right side of an Apply does not reference any outer parameters, the Apply can be replaced by a regular Join.

Basic Apply‑to‑Join rule
Basic Apply‑to‑Join rule

Project and Filter Unnesting

The second rule group pushes Apply deeper into the plan and lifts operators (Project, Filter) above it. This turns expressions that contain correlated parameters into ordinary columns, enabling further simplifications.

Project/Filter push‑down example
Project/Filter push‑down example

When Apply reaches the bottom of the tree, Rule (1) can turn it into a Join, completing the de‑correlation.

Aggregate Unnesting

Aggregates (GROUP BY) are handled by pushing Apply down and converting scalar aggregates into group aggregates. The transformation must ensure that the grouping key is the outer relation’s primary key.

Aggregate unnesting diagram
Aggregate unnesting diagram

Special care is required for functions where F(∅) ≠ F({NULL}). For example, COUNT(*) must be replaced by COUNT(non_null_column) or a Max1Row wrapper to preserve semantics.

Set‑Operation Unnesting

The final rule group deals with subqueries that contain UNION ALL, EXCEPT ALL, or inner joins. The guiding principle remains: push Apply down as far as possible and lift inner operators.

Set‑operation unnesting example
Set‑operation unnesting example

FAQ

Can any correlated subquery be un‑correlated? Yes, with a modest set of restrictions; theoretical proofs appear in references [1]–[3].

Differences between HyPer and SQL Server? HyPer defines more transformation rules (e.g., handling of distinct‑project “magic set” optimizations) and covers a broader set of cases.

Should these rules be applied in the rule‑based optimizer (RBO) or the cost‑based optimizer (CBO)? Generally they belong in a CBO, which can estimate the cost of the transformed plan and decide whether de‑correlation is beneficial.

References

Parameterized Queries and Nesting Equivalencies – C. Galindo‑Legaria

Orthogonal Optimization of Subqueries and Aggregation – C. Galindo‑Legaria, M. Joshi

Unnesting Arbitrary Queries – T. Neumann, A. Kemper

The Complete Story of Joins (in HyPer) – T. Neumann, V. Leis, A. Kemper

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLquery optimizationSubqueryApply OperatorUnnesting
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.