Databases 13 min read

Inside StarRocks: How the Pipeline Execution Engine Boosts Query Performance

This article explains the core concepts, architecture, and code logic of StarRocks' Pipeline execution framework, covering ExecPlan, PlanFragment, Fragment Instance, ExecNode, SourceOperator, SinkOperator, PipelineDriver scheduling, asynchronous handling of blocking operations, and the roles of FE and BE in MPP scheduling.

StarRocks

Oct 13, 2022

Background

StarRocks is an open‑source MPP database that separates query planning (handled by the Frontend, FE) from execution (handled by Backend nodes, BE). The Pipeline execution framework is a single‑node, multi‑core scheduler that differs from traditional MPP scheduling, which distributes work across multiple machines.

Key Objectives of Pipeline Scheduling

Reduce the cost of task scheduling on compute nodes.

Increase CPU utilization.

Fully exploit multi‑core capabilities to improve query performance, automatically set parallelism, and eliminate inaccurate manual settings.

Fundamental Concepts

ExecPlan

The physical execution plan generated by FE, composed of physical operators (ExecNode) such as OlapScanNode or HashJoinNode.

PlanFragment

A sub‑tree of the ExecPlan that can be executed in parallel across machines. Each fragment contains physical operators and a DataSink; upstream fragments send data to downstream fragments via Exchange operators.

Fragment Instance

An execution instance of a PlanFragment. FE determines the number of instances based on data partitioning, then dispatches them to BE. Within a BE, each Fragment Instance is further split into one or more Pipelines.

ExecNode (Physical Operator)

The building blocks of a PlanFragment, e.g., OlapScanNode, HashJoinNode, etc.

SourceOperator and SinkOperator

SourceOperator starts a Pipeline by producing data (reading local files, receiving data from upstream fragments, or from upstream Pipelines). SinkOperator ends a Pipeline by consuming results (writing to disk, sending to downstream fragments, or feeding a downstream Pipeline).

Pipeline Construction

FE creates the ExecPlan and splits it into PlanFragments. BE's PipelineBuilder then converts each PlanFragment into one or more Pipelines. Operators are transformed accordingly; for example, HashJoinNode becomes HashJoinBuildOperator and HashJoinProbeOperator.

PipelineDriver

A Pipeline instance is represented by a PipelineDriver . Multiple drivers can be created based on the degree of parallelism (dop). Each driver runs as a coroutine with three states:

Running : actively executing, repeatedly calling pull_chunk on the upstream operator and push_chunk on the downstream operator.

Blocked : waiting for an event (e.g., I/O, network, tablet read). The driver yields the CPU and is placed in a blocked queue monitored by a poller thread.

Ready : either after a time‑slice expires or when a previously blocked event becomes ready; the driver returns to the ready queue for execution.

This coroutine‑based scheduling reduces context‑switch overhead compared to traditional OS thread scheduling, improving CPU utilization in high‑concurrency scenarios.

Asynchronous Handling of Blocking Operations

To prevent PipelineDrivers from blocking execution threads, operations that would normally block are made asynchronous:

Scanning tablets (disk I/O) in ScanOperator.

Data exchange between fragments via ExchangeSinkOperator and ExchangeSourceOperator.

Hash table building in HashJoinBuildOperator before probing in HashJoinProbeOperator.

Full‑materialization operators are split into a sink and a source; the source waits for the sink to finish (e.g., AggregateBlockingSinkOperator and AggregateBlockingSourceOperator).

Thread Model on BE

BE uses two types of threads and two queues:

PipelineDriverExecutor : worker threads that continuously fetch Ready drivers from the ready queue, execute them, and place blocked drivers into the blocked queue.

PipelineDriverPoller : a poller thread that scans the blocked queue, moves drivers whose blocking condition is resolved back to the ready queue.

All drivers share a global pool of executor threads equal to the number of physical cores, enabling multiplexing across many queries.

Illustrative Example

Consider the simple SQL query: select A.c0, B.c1 from A, B where A.c0 = B.c0 FE generates a physical plan, splits it into three PlanFragments, creates multiple Fragment Instances, and finally builds Pipelines as shown in the following diagrams.

These diagrams illustrate how FE‑generated fragments are turned into Pipelines, how drivers are scheduled, and how asynchronous I/O prevents thread blockage.

Conclusion

The Pipeline execution engine in StarRocks solves the problem of high‑cost task scheduling and low CPU utilization in MPP databases by using a coroutine‑based scheduler, fine‑grained parallelism, and asynchronous handling of blocking operations. Understanding its components—ExecPlan, PlanFragment, Fragment Instance, ExecNode, Source/Sink operators, and PipelineDriver—provides a solid foundation for extending or optimizing the system.

StarRocks scheduling Pipeline Execution Engine MPP

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.