Databases 17 min read

How to Unnest Correlated Subqueries for Faster SQL Execution

This article explains why correlated subqueries are costly, introduces the Apply (correlated join) operator, and presents a series of systematic transformation rules that convert scalar, existential, and aggregate subqueries into efficient join‑based plans while preserving SQL semantics.

Liangxu Linux

Apr 25, 2023

How to Unnest Correlated Subqueries for Faster SQL Execution

Subquery Overview

Subqueries are a standard SQL construct that can appear in SELECT, FROM, WHERE and other clauses. They are classified into correlated subqueries (which reference outer‑query columns) and non‑correlated subqueries . The article focuses on correlated subqueries and further distinguishes scalar‑valued subqueries, existential tests (EXISTS), and quantified comparisons (IN, SOME, ANY, NOT IN) with concrete examples:

SELECT c_custkey FROM CUSTOMER WHERE 1000000 < (
    SELECT SUM(o_totalprice) FROM ORDERS WHERE o_custkey = c_custkey
);

SELECT o_orderkey, (
    SELECT c_name FROM CUSTOMER WHERE c_custkey = o_custkey
) AS c_name FROM ORDERS;

SELECT c_custkey FROM CUSTOMER WHERE c_nationkey = 86 AND EXISTS(
    SELECT * FROM ORDERS WHERE o_custkey = c_custkey
);

Original Execution Plan

In a naïve plan, a correlated subquery is evaluated inside a Filter expression. The executor calls an evaluator for each outer row, which in turn invokes another executor to compute the scalar subquery. This executor‑evaluator‑executor alternation leads to a nested‑loop‑like behavior that is extremely inefficient for large outer tables.

Apply Operator

The article introduces the Apply operator (also called Correlated Join) to make the correlation explicit. Apply takes two relation trees: the left (outer) input and the right (inner) input, which may depend on parameters from the outer side. Four variants are defined:

Cross Apply (A×A×)

Left Outer Apply (ALOJ)

Semi Apply (A∃)

Anti‑Semi Apply (A∄)

Diagram illustrating Apply:

Basic Decorrelation Rules

If the right side of an Apply does not reference any outer parameters, the Apply is equivalent to a regular join and can be replaced directly. This yields the first two obvious transformation rules.

Example applying rule (2) to a Semi‑Join:

Project and Filter Decorrelation

The second group of rules pushes Apply downwards and lifts the operators that sit above it (Project, Filter). By moving the correlated expressions out of the filter predicate, they become ordinary columns that can be processed by the Apply operator. This enables the subsequent conversion of Apply into a plain join.

Illustration of pushing Apply below a Filter:

Aggregate Decorrelation

When a correlated subquery contains aggregation, the scalar aggregation (ScalarAgg) is transformed into a group aggregation (GroupAgg) after pushing Apply down. A Max1Row operator may be added to guarantee that the inner side returns at most one row, preserving semantics for cases where the inner query could produce zero or multiple rows.

Max1Row(E) = {
  Null, if |E| = 0
  E,    if |E| = 1
  error, otherwise
}

Example of converting a scalar aggregation on COUNT(*) into a group aggregation on the outer key: Special care is required for aggregates where F(∅) ≠ F({NULL}) (e.g., COUNT(*) ). The article suggests replacing COUNT(*) with COUNT(non_null_column) or splitting expressions such as MIN(IF_NULL(...)) into a projection followed by an aggregation.

Replace COUNT(*) with COUNT(o_orderkey) when the inner table has a primary key.

For MIN(IF_NULL(...)), compute the IF_NULL in a Project step (variable X) and then apply MIN(X).

Set‑Operation Decorrelation

The final group of rules handles subqueries that involve UNION ALL or EXCEPT ALL . The same principle—push Apply down and lift operators—applies, but the transformation may duplicate the outer relation, leading to a DAG‑style plan. FAQ Can any correlated subquery be decorrelated? In theory, yes, after adding a few mild restrictions; the article cites proofs in the referenced papers. Differences between HyPer and SQL Server? HyPer defines more transformation rules (e.g., for outer joins) and uses the term “Correlated Join”. SQL Server’s earlier work introduced the Apply operator. Should these rules be applied in a cost‑based optimizer (CBO) or a rule‑based optimizer (RBO)? Not always; the best plan depends on data size, index availability, and cost estimates, so integrating the rules into a CBO is recommended. References Parameterized Queries and Nesting Equivalencies – C. Galindo‑Legaria Orthogonal Optimization of Subqueries and Aggregation – C. Galindo‑Legaria, M. Joshi Unnesting Arbitrary Queries – T. Neumann, A. Kemper The Complete Story of Joins (in HyPer) – T. Neumann, V. Leis, A. Kemper

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL Database Query Optimization Subquery Apply Operator Decorrelation

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.