Mastering Join Optimization in StarRocks: Techniques, Algorithms, and Distributed Planning
This article provides a comprehensive, step‑by‑step guide to StarRocks join optimization, covering join types, logical rewrite rules, predicate push‑down, join reorder algorithms, cost modeling, distributed join strategies, and runtime filters, while offering practical tips for achieving high‑performance query execution.
Join Background
StarRocks supports Cross Join, Full/Left/Right Outer Join, Anti Join, Semi Join and Inner Join. Different join algorithms (e.g., Sort‑Merge vs. Hash Join) have distinct performance characteristics, and the optimal algorithm depends on data distribution and query patterns.
Challenges in Join Optimization
Multiple join implementations make optimal selection difficult.
Choosing an execution order for multi‑table joins is combinatorial.
Accurate cost estimation before execution is hard.
A plan optimal on a single node may be sub‑optimal in a distributed environment.
Join Optimization Principles
StarRocks follows five principles: use the most efficient join type, build the hash table on the smaller side, prioritize high‑selectivity joins, minimize data fed into joins, and reduce network traffic in distributed joins.
Logical Optimization Rules
1. Join Type Conversion
Convert low‑performance joins to more efficient ones when possible:
Cross Join → Inner Join if a join predicate exists.
Outer Join → Inner Join when a strict (non‑null‑producing) predicate on the nullable side can filter rows.
Full Outer Join → Left/Right Outer Join when a strict predicate binds to one side.
-- Before conversion
SELECT * FROM t1, t2 WHERE t1.v1 = t2.v1;
-- After conversion
SELECT * FROM t1 INNER JOIN t2 ON t1.v1 = t2.v1;2. Predicate Push‑Down
Push predicates from WHERE clauses down to the earliest possible join inputs, reducing intermediate result sizes. For Inner Joins the rule is identical to WHERE push‑down. For Outer, Semi and Anti Joins only predicates that bind to the non‑preserved side and are strict can be pushed.
SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.v1 = t2.v1 WHERE t2.v1 > 0;
-- After push‑down
SELECT * FROM t1 INNER JOIN t2 ON t1.v1 = t2.v1 WHERE t2.v1 > 0;3. Predicate Extraction
Disjunctive predicates are transformed into conjunctive forms when possible, allowing further push‑down. Example:
-- Original predicate
WHERE (t2.v1 = 2 AND t1.v2 = 3) OR (t2.v1 > 5 AND t1.v2 = 4)
-- Extracted conjunctive predicates
WHERE t2.v1 >= 2 AND t1.v2 IN (3,4);4. Equivalence Derivation
Using join equality conditions, StarRocks derives equivalent predicates on the opposite side of a join, enabling additional filtering. This works for Inner Joins and, with one‑way restrictions, for Outer and Semi Joins.
Join Reorder
StarRocks treats a consecutive series of Inner or Cross Joins as a Multi Join Node and applies one of four algorithms:
Exhaustive : explores all permutations using commutativity and associativity (used for ≤4 tables).
Greedy : multi‑sequence greedy search keeping the top 10 partial plans at each step (used for 4‑10 tables).
Left‑Deep : builds a left‑deep tree; fast but not always optimal (fallback when statistics are missing).
DP‑sub : dynamic programming on subsets (used for 4‑10 tables).
Algorithm selection is based on table count: exhaustive for ≤4 tables, a mix of left‑deep, greedy and DP‑sub for 4‑10 tables, and greedy/left‑deep for >10 tables.
Cost Model
Join Cost = CPU * (Row(L) + Row(R)) + Memory * Row(R) Row(L)and Row(R) are the estimated output row counts of the left and right join inputs. The model accounts for CPU work and the memory needed to build the hash table on the right side.
Distributed Join Planning
StarRocks runs on an MPP architecture. For a simple join, data from each table is read on its owning nodes, shuffled according to join keys, and then joined locally. Five basic distributed join strategies are generated:
Shuffle Join : both sides are shuffled to the same set of nodes.
Broadcast Join : the smaller table is broadcast to all nodes of the larger table.
Bucket Shuffle Join : the smaller table is bucket‑shuffled to match the distribution of the larger table.
Colocate Join : tables pre‑partitioned into the same colocate group require no data movement.
Replicate Join (experimental): a full copy of the right table exists on every node.
The optimizer propagates required distribution properties from the join node down to its children. If a scan cannot satisfy the property, an Exchange (shuffle) node is inserted.
Global Runtime Filter
During hash join execution, after building the hash table on the right side, StarRocks creates a runtime filter (Min/Max, IN predicate, or Bloom filter) and pushes it to the left‑side scan. This filter prunes rows before they are fetched, reducing the join input size.
-- Build hash table on right side
SELECT * FROM right_table;
-- Create runtime filter and push to left side
-- Left side scan now applies filter before reading dataPractical Takeaways
Prefer high‑performance join types (Semi/Anti > Inner > Outer > Full > Cross).
Build hash tables on the smaller input.
Execute high‑selectivity joins first.
Minimize the amount of data entering joins.
Reduce network traffic in distributed joins.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
