What’s New in StarRocks 3.3? Deep Dive into Lakehouse‑Optimized Performance and Features
StarRocks 3.3 introduces a comprehensive set of enhancements—including maturity levels, ARM‑optimized performance, advanced caching, materialized‑view rewrites, storage optimizations, and expanded lakehouse ecosystem support—that together boost stability, query speed, and usability for large‑scale analytics workloads.
StarRocks 3.3 marks a major step forward for the Lakehouse architecture, delivering stability, performance, and ecosystem improvements across the board.
Maturity Levels and Feature Classification
New features are grouped into three maturity tiers: Experimental (interfaces may change), Preview (generally stable but still evolving), and GA (production‑ready). This helps users decide which capabilities are safe for production.
Stability and Large‑Query Optimizations
Spill‑to‑Disk operator (GA) reduces memory pressure for complex queries, preventing OOM failures.
Colocate Group Execution lowers memory usage for Join and Agg operators by executing queries in stages.
Performance Gains on ARM Architecture
StarRocks 3.3 achieves up to 20% cost reduction and 20% query speed improvement on AWS Graviton compared to x86. Benchmark results include:
SSB 100G: +11% speed
ClickBench: +39% speed
TPCH 100G: +13% speed
TPC‑DS 100G: +35% speed
Lakehouse‑Specific Optimizations
Improved Scan performance via Page Index optimization, reducing data read volume.
Metadata handling enhancements for faster overall processing.
Enhanced inverted and n‑gram indexes for fuzzy search.
FlatJson acceleration for semi‑structured data, achieving near‑structured query speed.
Bitmap function improvements, including Hive export capability.
CodeGen and vectorized regex improvements lower CPU cost of complex expressions.
Histogram statistics in external table stats mitigate data skew and improve shuffle join plans.
Global dictionary with dictionary_get() enables fast dimension lookups without joins.
Cache Design – The Final Piece of the Lakehouse Puzzle
StarRocks 3.3 adds native cache features that require no complex configuration:
Cache warmup command pre‑loads hot data.
Cache priority (3.3.1) lets users prioritize critical data.
Memory‑optimised cache and observability improvements simplify management.
Even when cache misses occur, parallel tablet scanning and automatic small‑I/O merging keep query performance high.
Materialized View Enhancements
Iceberg MV now supports partition‑level incremental refresh and hidden‑partition tables.
Paimon MV gains rewrite capabilities and incremental refresh.
Transparent MV rewrite mode ( transparent_mv_rewrite_mode) automatically unions refreshed partitions with base data.
New enable_query_rewrite flag and MV plan cache improve rewrite efficiency and scheduling.
Multi‑fact‑table partition refresh reduces refresh overhead in complex joins.
Storage Optimizations and Usability
FE observability and lock manager improvements raise concurrent import/query throughput (35% reduction in import time under 100‑thread load).
ORDER BY syntax and column rename support (3.3.1) simplify DDL.
Non‑string scalar storage reduced by 12%, cutting storage cost and speeding reads.
PK index now supports remote storage and size‑tiered compaction, lowering I/O and memory during compaction.
Finer‑grained page reads and Bloom filter enhancements improve PK index read performance.
Ecosystem Integration
StarRocks 3.3 expands lakehouse support:
Hive: write ORC and Text files; single‑sink write performance twice that of Trino.
Iceberg: re‑engineered metadata module, manifest caching, Avro parsing boost, equality‑delete for V2 tables, and view query support.
Paimon: full support including delete vectors, system‑table integration, and scan‑range scheduling.
ClickHouse and Kudu catalog integration plus migration tools for smoother data transfer.
Conclusion
StarRocks 3.3 delivers a mature Lakehouse solution with significant performance, stability, and usability gains, positioning it as a robust, open‑source MPP database for modern analytics workloads.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
