Inside StarRocks I/O: How Tablets, Fragments, Pipelines, and Morsels Power Parallel Scanning
This article explains the core I/O components of StarRocks—Tablet, Fragment, Pipeline, Morsel, ScanOperator, ChunkSource, ScanTask, and ChunkBuffer—showing how they work together to achieve high‑performance, low‑latency query execution in a compute‑storage separated architecture.
Introduction
StarRocks is a high‑performance massively parallel processing (MPP) database designed for high‑concurrency, low‑latency analytical workloads. In a compute‑storage separated architecture, query performance increasingly depends on the efficiency of the I/O layer. This article introduces the core concepts and mechanisms that shape the I/O execution path of a query.
Tablet
A Tablet is the basic physical storage unit in StarRocks, representing a horizontal shard of a table. The number of Tablets is determined by the bucket count specified in the DISTRIBUTED BY clause when the table is created.
CREATE TABLE orders (
order_id BIGINT,
user_id BIGINT,
amount DECIMAL(10,2)
) DISTRIBUTED BY HASH(user_id) BUCKETS 10;This statement creates 10 Tablets, each holding a subset of the data based on the hash of user_id. Tablets enable:
Parallel scanning – multiple Tablets can be scanned concurrently, fully utilizing CPU cores and cluster resources.
Data‑management unit – write, compaction, and load‑balancing operations are performed at the Tablet level.
Horizontal expansion – Tablets can be redistributed across new nodes when the cluster scales out.
Fragment
The Frontend (FE) generates a physical execution plan that is split into multiple Plan Fragments. Each Fragment contains a set of physical operators and a DataSink to pass results downstream. In the Pipeline engine, Fragments are further divided into Pipelines.
Pipeline
A Pipeline is a chain of operators: Source → intermediate operators → Sink. Data flows as Chunk objects between operators.
ScanOperator
The ScanOperator is the entry point of data into the Pipeline. It reads raw data from storage and emits it as Chunk s for downstream operators. Multiple ScanOperator instances can run in parallel, each handling different Tablets or Morsels.
Morsel
A Morsel is the smallest scheduling unit for a ScanOperator. Tablets are split into multiple Morsels, allowing finer‑grained parallelism and better load balancing. For example, a Tablet with 5 million rows can be divided into ~20 Morsels (≈256 K rows each).
CREATE TABLE user_logs (
log_id BIGINT,
user_id BIGINT,
action VARCHAR(50),
log_time DATETIME
) DISTRIBUTED BY HASH(user_id) BUCKETS 4;Assuming pipeline_dop = 8, the query SELECT COUNT(*) FROM user_logs WHERE action='click' will generate many Morsels. When Tablet‑level parallelism is disabled, only 4 large Morsels are created, limiting parallelism to 4. Enabling Tablet‑level parallelism creates ~80 small Morsels, allowing all 8 ScanOperators to stay busy and roughly halving the scan time.
Without Tablet‑level parallelism: 4 Morsels → effective parallelism 4 → ~40 s scan. With Tablet‑level parallelism: ~80 Morsels → effective parallelism 8 → ~20 s scan.
In skewed data scenarios, splitting large Tablets into many Morsels lets multiple ScanOperators share the load, reducing the overall execution time from ~88 s to ~20 s in the example.
ChunkSource
ChunkSource is the concrete data‑reading component bound to a single Morsel. It fetches data from storage, assembles it into Chunk s, and hands them to the ScanOperator. Multiple ChunkSources can run concurrently, each processing a different Morsel.
ChunkSource submits its read work as a ScanTask to a global I/O thread pool, enabling asynchronous, non‑blocking execution. The number of ChunkSources per ScanOperator is configurable, controlling the maximum concurrent Morsel processing.
ScanTask
ScanTask encapsulates the actual I/O operation for a Morsel. It is dispatched to the I/O thread pool, decoupling I/O latency from the Pipeline execution threads. This design allows high concurrency when accessing remote object stores (e.g., S3, HDFS) where each request may incur ~100 ms latency.
ChunkBuffer
ChunkBuffer bridges the asynchronous I/O side (ChunkSource) and the Pipeline side (ScanOperator). ChunkSource writes produced Chunk s into the buffer, while the ScanOperator reads them, enabling a producer‑consumer pattern that keeps both sides running efficiently.
Summary
StarRocks’ I/O execution path combines Morsel‑level task splitting, ScanOperator orchestration, asynchronous ChunkSource reads, and a ChunkBuffer decoupling mechanism to achieve high concurrency and low latency. By breaking data into fine‑grained Morsels and dynamically scheduling them across ScanOperators, the system maximizes CPU and I/O utilization, mitigates data skew, and sustains sub‑second query response times even under a compute‑storage separated architecture.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
