Mastering Subquery Unnesting: Turn Correlated Queries into Fast Joins
This article explains why correlated subqueries are a performance bottleneck, introduces the Apply (Correlated Join) operator and a systematic set of transformation rules that push Apply down and replace subqueries with efficient joins, covering scalar, existential, and quantified subqueries, aggregation handling, and set operations, all illustrated with TPC‑H examples.
Subquery (Subquery) optimization has long been a difficult point in SQL query optimization. Correlated subqueries are executed like a Nested‑Loop, which becomes intolerably slow as data grows, so they must be decorrelated (de‑correlation or Unnesting) into more efficient operators such as Semi‑Join.
The discussion is based on classic papers from SQL Server and HyPer and uses TPC‑H tables for examples.
Subquery Overview
A subquery is a syntax defined in the SQL standard that can appear in almost any clause (SELECT, FROM, WHERE, etc.). Subqueries are divided into:
Correlated Subquery – depends on parameters from the outer query.
Non‑correlated Subquery – can be executed independently and materialised before the outer query.
Only correlated subqueries are considered in this article.
Correlated subqueries are further classified by their result shape:
Scalar‑valued – returns a single row and column; if empty, returns NULL. Example:
SELECT c_custkey
FROM CUSTOMER
WHERE 1000000 < (
SELECT SUM(o_totalprice)
FROM ORDERS
WHERE o_custkey = c_custkey
);Existential Test – EXISTS subquery returning a Boolean value (a Semi‑Join). Example:
SELECT c_custkey
FROM CUSTOMER
WHERE c_nationkey = 86 AND EXISTS(
SELECT * FROM ORDERS
WHERE o_custkey = c_custkey
);Quantified Comparison – IN, SOME, ANY, ALL forms returning a Boolean value. Example:
SELECT c_name
FROM CUSTOMER
WHERE c_nationkey <> ALL (SELECT s_nationkey FROM SUPPLIER);Original Execution Plan
Using Query 1 as an example, the original plan shows the subquery attached under a Filter expression, causing the executor to invoke the evaluator for each outer row, which is extremely inefficient.
Apply Operator
The Apply operator (also called Correlated Join) receives two relation trees; the inner tree may contain parameters from the outer row. Four forms exist:
Cross Apply (A×A×)
Left Outer Apply (ALOJ)
Semi Apply (A∃)
Anti‑Semi Apply (A∄)
Basic Decorrelation Rules
The first group of rules removes Apply when its inner side does not reference outer parameters, turning it into a regular Join. Applying these rules to Query 1 shows why decorrelation is essential.
Project and Filter Decorrelation
The second rule set pushes Apply down and lifts operators such as Project and Filter above it, turning correlated expressions into ordinary variables that can be joined efficiently.
Aggregate Decorrelation
When a correlated subquery contains aggregation, the goal is still to push Apply down. ScalarAgg can be transformed into GroupAgg by grouping on the outer key. However, care must be taken with NULL handling because F(∅) ≠ F({NULL}) for some aggregates. The article suggests replacing COUNT(*) with COUNT(non‑null‑key) or splitting expressions that use IF_NULL before aggregation.
Example of a problematic transformation:
SELECT c_custkey,
(SELECT COUNT(*) FROM ORDERS WHERE o_custkey = c_custkey) AS count_orders
FROM CUSTOMER;After transformation without proper NULL handling, a customer with no orders would incorrectly receive a count of 1.
Set‑Operation Decorrelation
The final rule group handles subqueries that appear under UNION ALL, EXCEPT ALL, or Inner Join, again pushing Apply down as far as possible.
FAQ
Can any correlated subquery be decorrelated? Yes, with a few mild restrictions, theory shows that any correlated subquery can be transformed.
Differences between HyPer and SQL Server? HyPer covers more cases and provides explicit transformation rules for many operators, while SQL Server’s papers prove that such cases can be reduced to a manageable form.
Should these rules be applied in RBO or CBO? Not always; the optimizer should estimate costs and choose the best plan, sometimes keeping Apply when the outer side is tiny or the inner side has a suitable index.
References
Parameterized Queries and Nesting Equivalencies – C. Galindo‑Legaria
Orthogonal Optimization of Subqueries and Aggregation – C. Galindo‑Legaria, M. Joshi
Unnesting Arbitrary Queries – T. Neumann, A. Kemper
The Complete Story of Joins (in HyPer) – T. Neumann, V. Leis, A. Kemper
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
