Databases 15 min read

Decorrelation of Correlated Subqueries: Theory, Operators, and Optimization Rules

This article explains why correlated subqueries are a performance bottleneck, introduces the Apply (Correlated Join) operator, classifies subquery types, shows how to transform them into efficient joins, and discusses rule‑based push‑down techniques for Project, Filter, Aggregate, and Set operations.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Decorrelation of Correlated Subqueries: Theory, Operators, and Optimization Rules

Subquery optimization has long been a difficult problem in SQL query optimization because correlated subqueries are executed like nested loops, leading to unacceptable performance on large data sets.

The article first introduces the concept of decoorelation (or unnesting) and explains that both SQL Server and HyPer use similar theoretical frameworks to rewrite any query into a more efficient form.

It then classifies subqueries into scalar‑valued, existential (EXISTS), and quantified (IN, SOME, ANY) types, providing concrete SQL examples for each. Sample queries are shown using SELECT c_custkey, COUNT(*) AS custdist FROM (SELECT c_custkey, COUNT(o_orderkey) AS c_count FROM CUSTOMER LEFT OUTER JOIN ORDERS ON c_custkey = o_custkey AND o_comment NOT LIKE '%pending%deposits%' GROUP BY c_custkey) c_orders GROUP BY c_count ORDER BY custdist DESC, c_count DESC; and other snippets.

The original execution plan for a scalar subquery (Query 1) is illustrated, highlighting how the executor repeatedly invokes the evaluator for each row, which is highly inefficient.

To address this, the article introduces the Apply operator (also called Correlated Join). Apply takes two relation trees as input; the inner tree may depend on parameters from the outer tree. Four variants are defined: Cross Apply, Left Outer Apply, Semi Apply, and Anti‑Semi Apply.

Using Apply, the earlier scalar subquery can be rewritten as a join, eliminating the repeated executor‑evaluator calls. The transformation may require a Max1Row operator to preserve semantics when the inner query could return more than one row.

A systematic method is presented: extract subqueries into Apply nodes, then apply rule‑based rewrites to push Apply down and pull up surrounding operators. The rules cover Project and Filter push‑down, Aggregate push‑down (converting ScalarAgg to GroupAgg), and handling of Set operators (UNION ALL, EXCEPT ALL) with distinct‑project optimizations.

Special attention is given to edge cases where aggregate functions behave differently on empty sets versus sets containing NULL, and the article shows how to adjust aggregates (e.g., using COUNT(o_orderkey) instead of COUNT(*) ) to obtain correct results after transformation.

The final sections provide FAQs comparing HyPer and SQL Server approaches, discussing when decoorelation is beneficial, and noting that the rules are best applied within a cost‑based optimizer rather than as a blanket transformation.

References to seminal papers on subquery unnesting and join optimization are listed at the end.

SQLDatabaseQuery OptimizationsubqueryApply OperatorDecorrelation
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.