Mastering Correlated Subqueries: Decorrelation, Push‑Down Rules, and Optimization Techniques
This article explains what correlated subqueries are, why they challenge database engines, and how they can be transformed into ordinary SQL using decorrelation, push‑down rules, result reuse, and window‑function tricks, illustrated with TPCH examples and code snippets.
Background
This article introduces correlated subqueries, explains their semantics, benefits for writing concise SQL, and the execution challenges they pose to database systems.
Correlated Subquery Definition
A correlated subquery references columns from the outer query, e.g., using
NOT EXISTS (SELECT * FROM orders WHERE o_custkey = c_custkey)to find customers who have never placed an order.
-- All customers who have never placed an order
SELECT c_custkey
FROM customer
WHERE NOT EXISTS (
SELECT *
FROM orders
WHERE o_custkey = c_custkey
);Without correlation, the same result requires a LEFT JOIN and additional filtering.
SELECT c_custkey
FROM customer
LEFT JOIN (
SELECT DISTINCT o_custkey
FROM orders
) AS o ON o.o_custkey = c_custkey
WHERE o.o_custkey IS NULL;Correlated Join Operator
The optimizer represents a correlated subquery as a CorrelatedJoin (also called lateral or dependent join). Its left child is the outer query, the right child is the subquery, and the correlated columns are passed from left to right. Execution is iterative: for each outer row the subquery is evaluated with the current correlated values.
Decorrelation Techniques
1. Push‑Down Rules
CorrelatedJoin can be pushed past operators (FILTER, PROJECT, AGGREGATION) when possible, converting it into ordinary joins. For example, if neither side contains correlated columns, the operator becomes a left join without join predicates.
-- Example of pushing CorrelatedJoin through a filter
SELECT ... FROM outer
WHERE outer.col < (SELECT MAX(inner.val) FROM inner WHERE inner.key = outer.key);After pushing, the subquery can be rewritten as a regular join.
2. Simple Correlation Conversion
When the subquery’s result is independent of the outer row, the CorrelatedJoin is replaced by a cross‑join or a constant‑value join.
3. Push‑Through Project and Aggregation
When pushing through a PROJECT, the outer columns used by the subquery must be added to the projection. When pushing through an AGGREGATION, the grouping keys are extended with the outer keys, and a enforceSingleRow node is added to preserve semantics.
Result Reuse
Because the subquery result for the same correlated value is identical, the optimizer can compute the subquery once per distinct key and reuse the result, e.g., by materializing a distinct set of keys and joining back.
Window‑Function Based Decorrelation
CorrelatedJoin behaves like a window operator: it adds a column without changing row count. Some subqueries can be rewritten using window aggregates, reducing table scans. Example:
SELECT c_custkey,
SUM(o_totalprice) OVER (PARTITION BY c_custkey) AS total_spent
FROM customer LEFT JOIN orders ON o_custkey = c_custkey;This approach saves scans but may hinder further push‑down optimizations.
Practical Considerations and Pitfalls
Key issues include handling NULLs correctly (NULL in empty set vs. NULL in non‑empty set), avoiding the “count bug” where decorrelation changes row filtering, and ensuring deterministic subquery results for distributed execution. The article also discusses left‑mark‑join semantics in distributed environments.
Conclusion
Correlated subqueries provide expressive power but require careful optimizer transformations—push‑down, decorrelation, result reuse, and window‑based rewrites—to achieve efficient execution in modern distributed SQL engines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
