Databases 14 min read

How StarRocks 2.5 Improves Materialized Views for Real‑Time and Offline Queries

This article analyzes the requirements, design choices, and implementation details of materialized views in StarRocks, covering demand analysis, synchronous and asynchronous refresh solutions, partition binding, task scheduling, partition‑refresh maintenance, Insert‑Overwrite mechanics, view invalidation handling, and the upcoming features planned for version 2.5.

StarRocks
StarRocks
StarRocks
How StarRocks 2.5 Improves Materialized Views for Real‑Time and Offline Queries

Demand Analysis of Materialized Views

To understand materialized views, first recall that a view is a virtual table defined by a query. A materialized view (MV) materializes this virtual table as a physical table, enabling query acceleration through automatic QueryRewrite when similar SQL statements are issued.

Typical scenarios include:

Real‑time joins of two streaming tables – single‑table sync MVs cannot handle multi‑table joins.

Offline multi‑table modeling where fact and dimension tables are joined – users expect both query acceleration and modeling simplification.

Cases where the MV result set is small but the default partition/bucket layout would be large, causing performance degradation.

These gaps motivate the design of more flexible MV capabilities.

Design of Materialized Views

Two complementary solutions are introduced:

Synchronous solution – optimal for low‑complexity SQL with strong consistency, but struggles with complex queries.

Asynchronous refresh solution – accepts limited refresh latency to support complex SQL. Users can configure periodic refresh (e.g., every minute) for high‑frequency streaming data, or manual refresh for fine‑grained control.

When creating an MV, users can bind it to a specific partition of a base table using PARTITION BY. Buckets are mandatory but can be set independently of the base table, allowing reduction of partition/bucket count when the MV result is small.

Refresh options include: REFRESH ASYNC – automatically determines which partitions need refresh when base tables change. REFRESH MANUAL – lets users trigger refresh explicitly.

These features are already available in StarRocks 2.4.

Implementation Principles

StarRocks 2.4 implements a multi‑table asynchronous MV framework that relies on three core technologies:

Task scheduling framework

Partition‑refresh maintenance

Insert‑Overwrite mechanism

Task Scheduling Framework

A Task acts as a reusable template; a TaskRun is the actual execution unit derived from the template, analogous to a Java class and its object. The framework supports three refresh modes: manual, trigger‑based, and periodic.

Execution flow:

TaskRun objects are created from a Task and placed into a Pending queue.

The scheduler pulls TaskRuns from Pending, runs them, and moves them to a Running queue.

After completion, TaskRuns are recorded in a TaskRun history collection.

Default parameters: queue length 500, concurrency 20, history retention 3 days.

Partition Refresh Maintenance

When an MV is created, it can be bound to a specific partition of a base table. The refresh framework adds or removes partitions before each refresh to ensure the MV’s partition set covers the bound base partitions. Partition mapping can be 1:1, 1:n, n:1, or n:n.

Refresh decisions are made by comparing version numbers of base‑table partitions and MV partitions. If a base partition’s version is newer than the corresponding MV partition, the MV partition is refreshed using the Insert‑Overwrite technique.

Insert Overwrite

The Insert‑Overwrite process consists of three steps:

Create a temporary partition.

Write the refreshed data into the temporary partition.

Atomically replace the target partition with the temporary one.

This mechanism guarantees that readers either see the old data or the fully refreshed new data, never a partially updated state.

View Invalidation

If the schema of a base table changes (e.g., a column is dropped), the associated MV becomes invalid for refresh. The MV can still be queried, but it must be rebuilt to resume refresh capability.

Roadmap for StarRocks 2.5

Version 2.5 will add several critical features that were missing in 2.4:

QueryRewrite support for automatic MV substitution.

Ability to create MVs from external tables and to create MVs on top of existing MVs.

MV TTL (time‑to‑live) management.

Optimized refresh efficiency, including partial‑refresh syntax and configuration to handle complex refresh scenarios.

These enhancements aim to broaden MV applicability and improve performance in both real‑time and offline workloads.

task schedulingStarRocksmaterialized viewQuery RewriteAsync RefreshInsert Overwrite
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.