Databases 17 min read

StarRocks 3.1 Highlights: Faster Lakehouse Analytics and Advanced Materialized Views

StarRocks 3.1 introduces a cloud‑native, lakehouse‑oriented architecture with enhanced storage‑compute separation, up to 3‑6× faster data‑lake queries than Trino/Presto, expanded Iceberg and Paimon support, richer materialized view capabilities, new random bucketing, expression partitioning, generated columns, and spill‑to‑disk stability, all backed by extensive performance optimizations and open‑source contributions.

StarRocks

Aug 9, 2023

Overview

StarRocks 3.1 is a major release that introduces cloud‑native lakehouse capabilities. It enhances storage‑compute separation, provides 3‑6× faster query performance on Iceberg‑based data lakes, adds advanced materialized view features, expands support for semi‑structured data types, and improves stability with spill‑to‑disk for blocking operators.

Storage‑Compute Separation

The architecture now fully supports primary‑key tables, auto‑increment columns, and expression‑based partitioning. Query and import performance under the separated model is comparable to the integrated model while reducing storage costs.

Lakehouse Data‑Lake Query Performance

StarRocks 3.1 delivers 3‑6× faster query performance on Apache Iceberg data lakes compared with Trino/Presto, while maintaining full Trino syntax compatibility. It adds read/write support for Iceberg and native analysis of Apache Paimon streaming lakes.

Materialized Views

Asynchronous Materialized Views

Support for ORDER BY and colocate_group to leverage native storage optimizations.

Configurable storage_medium and cooldown_time for lifecycle management.

Random bucketing is the default, simplifying view creation.

Session‑level variables can be set per view to control timeout, parallelism, memory limits, etc.

Views can be created from existing views; SWAP enables atomic schema changes.

Manual activation of stale views after base‑table rebuilds.

Synchronous Materialized Views

All aggregate functions, CASE‑WHEN, CAST, and arithmetic expressions are supported. Multiple aggregate columns per view and HINT‑based direct queries are now possible.

CREATE MATERIALIZED VIEW v1 AS
SELECT b,
       SUM(a + 1) AS sum_a1,
       MIN(CAST(a AS BIGINT)) AS min_a
FROM base_table
GROUP BY b;

Query Performance Enhancements

Generated columns automatically compute expression results during import, accelerating queries on JSON, ARRAY, MAP, and STRUCT data.

Column‑mode updates for primary‑key tables improve UPDATE performance by up to tenfold when only a few columns change.

Cardinality‑preserving join pruning can speed up star‑schema and snowflake queries by >10×.

Spill‑to‑disk for aggregation, sorting, and join operators improves stability on large datasets; TPCH‑10TB runs successfully on a 3‑node cluster (16 cores, 20 GB RAM per node).

Table Creation & Data Import Improvements

Random bucketing (default) removes the need to specify bucket keys.

Expression partitioning and LIST partitioning provide flexible partition definitions.

The FILES() table function enables one‑click import of Parquet/ORC files from S3/HDFS with automatic schema inference.

CREATE TABLE site_access(
    event_day DATE,
    site_id INT DEFAULT '10',
    ...
) DUPLICATE KEY(event_day, site_id)
PARTITION BY date_trunc('day', event_day)
DISTRIBUTED BY HASH(event_day, site_id) BUCKETS 10; -- bucket count auto‑determined

CREATE TABLE insert_wiki_edit AS
SELECT * FROM FILES(
    'path' = 's3://inserttest/parquet/insert_wiki_edit_append.parquet',
    'format' = 'parquet');

Semi‑Structured Data Support

Version 3.1 adds native MAP and STRUCT types, along with a growing set of scalar, aggregate, and higher‑order functions. Array types now support Fast Decimal and can contain nested MAP, STRUCT, or ARRAY elements. Generated columns can pre‑compute complex expressions on these types, further boosting query speed.

Repository and Documentation

Source code: https://github.com/StarRocks/starrocks

Release notes and detailed documentation (including auto‑increment, expression partitioning, data cache, Iceberg catalog, Elasticsearch catalog, Paimon catalog, generated columns, UPDATE syntax, random bucketing, FILES table function, etc.) are available at the StarRocks documentation site: https://docs.starrocks.io/zh-cn/3.1/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL StarRocks Data Lake Materialized Views Lakehouse

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.