Databases 21 min read

Inside StarRocks I/O: How Tablets, Fragments, Pipelines, and Morsels Power Parallel Scanning

This article explains the core I/O components of StarRocks—Tablet, Fragment, Pipeline, Morsel, ScanOperator, ChunkSource, ScanTask, and ChunkBuffer—showing how they work together to achieve high‑performance, low‑latency query execution in a compute‑storage separated architecture.

StarRocks

Apr 1, 2026

Introduction

StarRocks is a high‑performance massively parallel processing (MPP) database designed for high‑concurrency, low‑latency analytical workloads. In a compute‑storage separated architecture, query performance increasingly depends on the efficiency of the I/O layer. This article introduces the core concepts and mechanisms that shape the I/O execution path of a query.

Tablet

A Tablet is the basic physical storage unit in StarRocks, representing a horizontal shard of a table. The number of Tablets is determined by the bucket count specified in the DISTRIBUTED BY clause when the table is created.

CREATE TABLE orders (
  order_id BIGINT,
  user_id BIGINT,
  amount DECIMAL(10,2)
) DISTRIBUTED BY HASH(user_id) BUCKETS 10;

This statement creates 10 Tablets, each holding a subset of the data based on the hash of user_id. Tablets enable:

Parallel scanning – multiple Tablets can be scanned concurrently, fully utilizing CPU cores and cluster resources.

Data‑management unit – write, compaction, and load‑balancing operations are performed at the Tablet level.

Horizontal expansion – Tablets can be redistributed across new nodes when the cluster scales out.

Fragment

The Frontend (FE) generates a physical execution plan that is split into multiple Plan Fragments. Each Fragment contains a set of physical operators and a DataSink to pass results downstream. In the Pipeline engine, Fragments are further divided into Pipelines.

Pipeline

A Pipeline is a chain of operators: Source → intermediate operators → Sink. Data flows as Chunk objects between operators.

ScanOperator

The ScanOperator is the entry point of data into the Pipeline. It reads raw data from storage and emits it as Chunk s for downstream operators. Multiple ScanOperator instances can run in parallel, each handling different Tablets or Morsels.

Morsel

A Morsel is the smallest scheduling unit for a ScanOperator. Tablets are split into multiple Morsels, allowing finer‑grained parallelism and better load balancing. For example, a Tablet with 5 million rows can be divided into ~20 Morsels (≈256 K rows each).

CREATE TABLE user_logs (
  log_id BIGINT,
  user_id BIGINT,
  action VARCHAR(50),
  log_time DATETIME
) DISTRIBUTED BY HASH(user_id) BUCKETS 4;

Assuming pipeline_dop = 8, the query SELECT COUNT(*) FROM user_logs WHERE action='click' will generate many Morsels. When Tablet‑level parallelism is disabled, only 4 large Morsels are created, limiting parallelism to 4. Enabling Tablet‑level parallelism creates ~80 small Morsels, allowing all 8 ScanOperators to stay busy and roughly halving the scan time.

Without Tablet‑level parallelism: 4 Morsels → effective parallelism 4 → ~40 s scan. With Tablet‑level parallelism: ~80 Morsels → effective parallelism 8 → ~20 s scan.

In skewed data scenarios, splitting large Tablets into many Morsels lets multiple ScanOperators share the load, reducing the overall execution time from ~88 s to ~20 s in the example.

ChunkSource

ChunkSource is the concrete data‑reading component bound to a single Morsel. It fetches data from storage, assembles it into Chunk s, and hands them to the ScanOperator. Multiple ChunkSources can run concurrently, each processing a different Morsel.

ChunkSource submits its read work as a ScanTask to a global I/O thread pool, enabling asynchronous, non‑blocking execution. The number of ChunkSources per ScanOperator is configurable, controlling the maximum concurrent Morsel processing.

ScanTask

ScanTask encapsulates the actual I/O operation for a Morsel. It is dispatched to the I/O thread pool, decoupling I/O latency from the Pipeline execution threads. This design allows high concurrency when accessing remote object stores (e.g., S3, HDFS) where each request may incur ~100 ms latency.

ChunkBuffer

ChunkBuffer bridges the asynchronous I/O side (ChunkSource) and the Pipeline side (ScanOperator). ChunkSource writes produced Chunk s into the buffer, while the ScanOperator reads them, enabling a producer‑consumer pattern that keeps both sides running efficiently.

Summary

StarRocks’ I/O execution path combines Morsel‑level task splitting, ScanOperator orchestration, asynchronous ChunkSource reads, and a ChunkBuffer decoupling mechanism to achieve high concurrency and low latency. By breaking data into fine‑grained Morsels and dynamically scheduling them across ScanOperators, the system maximizes CPU and I/O utilization, mitigates data skew, and sustains sub‑second query response times even under a compute‑storage separated architecture.

StarRocks MPP database Pipeline I/O Architecture Morsel Parallel Scanning

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.