Databases 26 min read

Mastering Join Optimization in StarRocks: Techniques, Algorithms, and Distributed Planning

This article provides a comprehensive, step‑by‑step guide to StarRocks join optimization, covering join types, logical rewrite rules, predicate push‑down, join reorder algorithms, cost modeling, distributed join strategies, and runtime filters, while offering practical tips for achieving high‑performance query execution.

StarRocks
StarRocks
StarRocks
Mastering Join Optimization in StarRocks: Techniques, Algorithms, and Distributed Planning

Join Background

StarRocks supports Cross Join, Full/Left/Right Outer Join, Anti Join, Semi Join and Inner Join. Different join algorithms (e.g., Sort‑Merge vs. Hash Join) have distinct performance characteristics, and the optimal algorithm depends on data distribution and query patterns.

Challenges in Join Optimization

Multiple join implementations make optimal selection difficult.

Choosing an execution order for multi‑table joins is combinatorial.

Accurate cost estimation before execution is hard.

A plan optimal on a single node may be sub‑optimal in a distributed environment.

Join Optimization Principles

StarRocks follows five principles: use the most efficient join type, build the hash table on the smaller side, prioritize high‑selectivity joins, minimize data fed into joins, and reduce network traffic in distributed joins.

Logical Optimization Rules

1. Join Type Conversion

Convert low‑performance joins to more efficient ones when possible:

Cross Join → Inner Join if a join predicate exists.

Outer Join → Inner Join when a strict (non‑null‑producing) predicate on the nullable side can filter rows.

Full Outer Join → Left/Right Outer Join when a strict predicate binds to one side.

-- Before conversion
SELECT * FROM t1, t2 WHERE t1.v1 = t2.v1;
-- After conversion
SELECT * FROM t1 INNER JOIN t2 ON t1.v1 = t2.v1;

2. Predicate Push‑Down

Push predicates from WHERE clauses down to the earliest possible join inputs, reducing intermediate result sizes. For Inner Joins the rule is identical to WHERE push‑down. For Outer, Semi and Anti Joins only predicates that bind to the non‑preserved side and are strict can be pushed.

SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.v1 = t2.v1 WHERE t2.v1 > 0;
-- After push‑down
SELECT * FROM t1 INNER JOIN t2 ON t1.v1 = t2.v1 WHERE t2.v1 > 0;

3. Predicate Extraction

Disjunctive predicates are transformed into conjunctive forms when possible, allowing further push‑down. Example:

-- Original predicate
WHERE (t2.v1 = 2 AND t1.v2 = 3) OR (t2.v1 > 5 AND t1.v2 = 4)
-- Extracted conjunctive predicates
WHERE t2.v1 >= 2 AND t1.v2 IN (3,4);

4. Equivalence Derivation

Using join equality conditions, StarRocks derives equivalent predicates on the opposite side of a join, enabling additional filtering. This works for Inner Joins and, with one‑way restrictions, for Outer and Semi Joins.

Join Reorder

StarRocks treats a consecutive series of Inner or Cross Joins as a Multi Join Node and applies one of four algorithms:

Exhaustive : explores all permutations using commutativity and associativity (used for ≤4 tables).

Greedy : multi‑sequence greedy search keeping the top 10 partial plans at each step (used for 4‑10 tables).

Left‑Deep : builds a left‑deep tree; fast but not always optimal (fallback when statistics are missing).

DP‑sub : dynamic programming on subsets (used for 4‑10 tables).

Algorithm selection is based on table count: exhaustive for ≤4 tables, a mix of left‑deep, greedy and DP‑sub for 4‑10 tables, and greedy/left‑deep for >10 tables.

Cost Model

Join Cost = CPU * (Row(L) + Row(R)) + Memory * Row(R)
Row(L)

and Row(R) are the estimated output row counts of the left and right join inputs. The model accounts for CPU work and the memory needed to build the hash table on the right side.

Distributed Join Planning

StarRocks runs on an MPP architecture. For a simple join, data from each table is read on its owning nodes, shuffled according to join keys, and then joined locally. Five basic distributed join strategies are generated:

Shuffle Join : both sides are shuffled to the same set of nodes.

Broadcast Join : the smaller table is broadcast to all nodes of the larger table.

Bucket Shuffle Join : the smaller table is bucket‑shuffled to match the distribution of the larger table.

Colocate Join : tables pre‑partitioned into the same colocate group require no data movement.

Replicate Join (experimental): a full copy of the right table exists on every node.

The optimizer propagates required distribution properties from the join node down to its children. If a scan cannot satisfy the property, an Exchange (shuffle) node is inserted.

Global Runtime Filter

During hash join execution, after building the hash table on the right side, StarRocks creates a runtime filter (Min/Max, IN predicate, or Bloom filter) and pushes it to the left‑side scan. This filter prunes rows before they are fetched, reducing the join input size.

-- Build hash table on right side
SELECT * FROM right_table;
-- Create runtime filter and push to left side
-- Left side scan now applies filter before reading data

Practical Takeaways

Prefer high‑performance join types (Semi/Anti > Inner > Outer > Full > Cross).

Build hash tables on the smaller input.

Execute high‑selectivity joins first.

Minimize the amount of data entering joins.

Reduce network traffic in distributed joins.

Join types illustration
Join types illustration
Join optimization challenges
Join optimization challenges
Distributed join cost
Distributed join cost
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

StarRocksQuery PlanningDistributed SQLJOIN optimizationCost Model
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.