How StarRocks Transforms a SQL Query into Distributed Execution: A Deep Dive
This article explains how StarRocks converts a SQL statement into an optimal distributed physical execution plan, schedules the plan across compute nodes, and runs it using MPP, pipeline parallelism, and vectorized execution to achieve near‑linear performance scaling.
Overview
StarRocks processes a SQL query in three stages: (1) transform the SQL text into an optimal distributed physical execution plan, (2) schedule the plan to compute nodes, and (3) execute the plan on those nodes.
Frontend (FE) and Backend (BE) Roles
FE parses, optimizes, schedules queries and manages metadata. BE executes the plan and stores data.
From SQL Text to Execution Plan
The conversion consists of five steps:
SQL Parse : the SQL string is parsed by ANTLR4 into an abstract syntax tree (AST) represented by a QueryStmt containing SelectList, FromClause, WherePredicate, GroupByClause, HavingPredicate, OrderByElement, LimitElement, etc.
SQL Analyze : the AST is bound to catalog metadata (databases, tables, columns), legality checks are performed (e.g., no GROUPING in WHERE, HLL/Bitmap cannot be summed), aliases are resolved, function argument types are validated, and necessary type casts are inserted. The result is a hierarchical Relation tree.
SQL Logical Plan : the Relation tree is converted into a logical plan tree where each relational operation becomes a logical node.
SQL Optimize : a Cascades/ORCA‑style cost‑based optimizer rewrites the logical plan and selects the lowest‑cost distributed physical plan. Major rewrite rules include:
Expression simplification
Column pruning
Predicate push‑down
Limit merge / push‑down
Aggregation merge
Constant propagation and folding
Outer‑join to inner‑join conversion
Common sub‑expression reuse
Subquery rewrite
Lateral‑join simplification
Partition/bucket pruning
Empty‑node and empty‑set pruning (UNION/INTERSECT/EXCEPT)
Intersect reorder
Count‑distinct aggregation rewrite
Plan Fragment Generation : the chosen physical plan is mapped one‑to‑one into BE‑executable plan fragments.
CBO Transform Details
After logical rewrite, the optimizer applies additional transformations:
Multi‑stage aggregation (e.g., count, sum → two‑stage; count‑distinct → three‑ or four‑stage)
Join side adjustment: the smaller table is built as the hash side; the optimizer may swap join sides or convert LEFT JOIN to RIGHT JOIN based on cost.
Join reordering: for ≤5 tables, exhaustive exchange/associativity search; for >5 tables, greedy + dynamic programming.
Distributed join strategy selection (broadcast, shuffle, single‑side shuffle, colocate, replicated) using cost estimation and property enforcement.
Push‑down aggregate to join.
Materialized view selection and rewrite.
Statistics and Cost Estimation
Accurate table‑level and column‑level statistics (collected automatically or manually, full or sampled) feed the cost model. Cost estimation considers CPU, memory, network, and I/O with different weights per operator.
Execution Scheduling
FE creates execution instances for each fragment, assigns fragments to BE nodes, selects healthy tablet replicas for scans, and sends fragment parameters to BE via Thrift. Scheduling follows an “All‑At‑Once” top‑down traversal of the fragment tree.
MPP Distributed Execution
StarRocks splits a query into many fragment instances that run on separate BE nodes, achieving near‑linear scalability as the cluster grows.
Pipeline Parallelism
Within a fragment, pipelines are created. Each pipeline consists of a source operator, zero or more transform operators, and a sink operator. Multiple pipeline drivers run in parallel, and a user‑space coroutine scheduler minimizes thread overhead.
Vectorized Execution
Operators and expressions are rewritten to process data in columnar batches, reducing virtual‑function calls and improving CPU cache usage. The vectorized engine delivers 5‑10× performance gains over row‑wise execution.
Summary
StarRocks transforms a SQL query into an optimal distributed physical plan using a powerful CBO optimizer, schedules plan fragments to appropriate BE nodes, and executes them with MPP, pipeline, and vectorized engines, achieving high performance that scales with both cluster size and CPU cores.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
