StarRocks 4.1 Enables Faster Iceberg Queries While Preserving Data Freshness
StarRocks 4.1 introduces an incremental materialized view for Apache Iceberg that ties refresh cost to data changes instead of table size, dramatically cutting refresh time, maintaining low latency, and keeping query results fresh even as tables scale to terabytes or petabytes, with a fallback to partition refresh when needed.
Iceberg provides a unified open table format for large‑scale analytics, allowing multiple compute engines to share data.
As data volume grows, refreshing materialized views becomes a bottleneck: full refresh scans all history, partition‑based refresh wastes resources, refresh frequency cannot keep up, and view freshness degrades.
From Partition‑Based to Change‑Based Refresh
StarRocks 4.1 introduces Incremental Materialized View for Iceberg. Instead of refreshing based on partitions, the refresh cost is tied to the amount of data change. The system records the previous snapshot version, identifies new data via Iceberg snapshots, generates a minimal incremental plan that only recomputes affected operators (including joins), and executes the refresh within the same transaction as the base‑table update, guaranteeing atomic visibility.
If incremental refresh is not applicable, the AUTO mode automatically falls back to partition‑based refresh without manual intervention.
Performance Evaluation
A benchmark on a 100 GB Iceberg SSB dataset compared incremental MV with traditional partition refresh across three scenarios: single‑table aggregation, multi‑table join, and join‑plus‑aggregation. Each scenario ran three refresh cycles: an initial full build followed by two incremental runs.
Results show that the first build costs are similar, but in subsequent refreshes the incremental approach processes only changed data, reducing refresh time by an order of magnitude or more. The multi‑table join case gains the most because only the affected join paths are recomputed. Refresh latency remains stable even as underlying tables grow, so query acceleration persists.
Note: Incremental refresh currently supports only Iceberg append‑only tables. Tables that undergo UPDATE, MERGE, or OVERWRITE cannot be refreshed with refresh_mode = INCREMENTAL and should use the traditional partition refresh.
Quick Start
To enable incremental refresh when creating a materialized view, set the refresh_mode property to INCREMENTAL:
CREATE MATERIALIZED VIEW test_mv1
PARTITION BY dt
REFRESH DEFERRED MANUAL
PROPERTIES (
"refresh_mode" = "INCREMENTAL"
)
AS SELECT ...
FROM iceberg_catalog.iceberg_test_db.t1
JOIN iceberg_catalog.iceberg_test_db.t2 ON t1.dt = t2.dt
... GROUP BY t1.dt, t1.col1, t2.col1, ...;Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
