Big Data 15 min read

What’s New in StarRocks 3.3? Deep Dive into Lakehouse‑Optimized Performance and Features

StarRocks 3.3 introduces a comprehensive set of enhancements—including maturity levels, ARM‑optimized performance, advanced caching, materialized‑view rewrites, storage optimizations, and expanded lakehouse ecosystem support—that together boost stability, query speed, and usability for large‑scale analytics workloads.

StarRocks

Jul 2, 2024

StarRocks 3.3 marks a major step forward for the Lakehouse architecture, delivering stability, performance, and ecosystem improvements across the board.

Maturity Levels and Feature Classification

New features are grouped into three maturity tiers: Experimental (interfaces may change), Preview (generally stable but still evolving), and GA (production‑ready). This helps users decide which capabilities are safe for production.

Stability and Large‑Query Optimizations

Spill‑to‑Disk operator (GA) reduces memory pressure for complex queries, preventing OOM failures.

Colocate Group Execution lowers memory usage for Join and Agg operators by executing queries in stages.

Performance Gains on ARM Architecture

StarRocks 3.3 achieves up to 20% cost reduction and 20% query speed improvement on AWS Graviton compared to x86. Benchmark results include:

SSB 100G: +11% speed

ClickBench: +39% speed

TPCH 100G: +13% speed

TPC‑DS 100G: +35% speed

Lakehouse‑Specific Optimizations

Improved Scan performance via Page Index optimization, reducing data read volume.

Metadata handling enhancements for faster overall processing.

Enhanced inverted and n‑gram indexes for fuzzy search.

FlatJson acceleration for semi‑structured data, achieving near‑structured query speed.

Bitmap function improvements, including Hive export capability.

CodeGen and vectorized regex improvements lower CPU cost of complex expressions.

Histogram statistics in external table stats mitigate data skew and improve shuffle join plans.

Global dictionary with dictionary_get() enables fast dimension lookups without joins.

Cache Design – The Final Piece of the Lakehouse Puzzle

StarRocks 3.3 adds native cache features that require no complex configuration:

Cache warmup command pre‑loads hot data.

Cache priority (3.3.1) lets users prioritize critical data.

Memory‑optimised cache and observability improvements simplify management.

Even when cache misses occur, parallel tablet scanning and automatic small‑I/O merging keep query performance high.

Materialized View Enhancements

Iceberg MV now supports partition‑level incremental refresh and hidden‑partition tables.

Paimon MV gains rewrite capabilities and incremental refresh.

Transparent MV rewrite mode ( transparent_mv_rewrite_mode) automatically unions refreshed partitions with base data.

New enable_query_rewrite flag and MV plan cache improve rewrite efficiency and scheduling.

Multi‑fact‑table partition refresh reduces refresh overhead in complex joins.

Storage Optimizations and Usability

FE observability and lock manager improvements raise concurrent import/query throughput (35% reduction in import time under 100‑thread load).

ORDER BY syntax and column rename support (3.3.1) simplify DDL.

Non‑string scalar storage reduced by 12%, cutting storage cost and speeding reads.

PK index now supports remote storage and size‑tiered compaction, lowering I/O and memory during compaction.

Finer‑grained page reads and Bloom filter enhancements improve PK index read performance.

Ecosystem Integration

StarRocks 3.3 expands lakehouse support:

Hive: write ORC and Text files; single‑sink write performance twice that of Trino.

Iceberg: re‑engineered metadata module, manifest caching, Avro parsing boost, equality‑delete for V2 tables, and view query support.

Paimon: full support including delete vectors, system‑table integration, and scan‑range scheduling.

ClickHouse and Kudu catalog integration plus migration tools for smoother data transfer.

Conclusion

StarRocks 3.3 delivers a mature Lakehouse solution with significant performance, stability, and usability gains, positioning it as a robust, open‑source MPP database for modern analytics workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

big data Storage Engine StarRocks query performance cache optimization Lakehouse materialized view

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.