Databases 14 min read

Why StarRocks Is Redefining Fast Unified OLAP Analytics

StarRocks combines vectorized execution, a new cost‑based optimizer, materialized views, a real‑time storage engine, pipeline execution, and distributed joins to deliver a unified, high‑performance OLAP solution that supports both traditional and lakehouse analytics while reducing operational complexity.

Alibaba Cloud Big Data AI Platform

Jul 22, 2024

Why StarRocks Is Redefining Fast Unified OLAP Analytics

StarRocks Background Introduction

Over the past decade, the demand for real‑time OLAP analysis has grown, leading to products such as ClickHouse, Druid, Kylin, Presto/Trino, Impala, and Kudu. Choosing among them increases operational cost. StarRocks was created to satisfy all data‑analysis needs with a single product, lowering selection and maintenance costs.

StarRocks Application Scenarios

StarRocks supports a wide range of OLAP scenarios, including user‑facing reports, business reports, user profiling, data warehousing, order analysis, and ad‑hoc queries.

Community Rapid Iteration

From version 1.x to 3.x, the community has added vectorized engines, core CBO capabilities, storage‑compute separation, and large‑scale analysis support, evolving on a lake‑warehouse integrated architecture.

OLAP Analysis Core Challenges

Typical OLAP workloads suffer from insufficient performance, low data freshness, concurrency bottlenecks, and inflexible modeling.

Performance: Full Vectorization

StarRocks implements full vectorization, enabling column‑store processing, reduced virtual‑function calls, better CPU‑cache utilization, and SIMD usage, resulting in several‑fold speedups for filters, aggregations, and joins.

Performance: New Cost‑Based Optimizer (CBO)

The optimizer processes SQL through Parser, Analyzer, Transformer, Rewriter, and Optimizer stages. Transformer and Rewriter apply rule‑based optimizations, while the Optimizer uses a memo‑based cost model to select the most efficient execution plan, minimizing network and CPU overhead.

Performance: Modern Materialized Views

Materialized views automatically build, refresh, and rewrite queries, allowing transparent acceleration without modifying business SQL.

Real‑Time: Storage Engine

To meet strict data‑freshness requirements, StarRocks adopts a Delete‑and‑Insert approach with primary indexes and delete bitmaps, providing high‑throughput real‑time updates comparable to SQL Server and Alibaba Cloud’s hologram solution.

High Concurrency: Pipeline Engine

The user‑mode scheduled pipeline engine decouples I/O and compute, fragmenting execution into fine‑grained drivers, which maximizes CPU utilization under concurrent workloads.

Flexibility: Distributed Join

Built‑in distributed join complements vectorization and CBO, delivering efficient and flexible analysis for complex business queries.

StarRocks 3.X – New OLAP Paradigm

Version 3.x introduces storage‑compute separation, reducing storage costs by moving raw data to OSS/HDFS while keeping hot data cached on compute nodes. This architecture improves cost efficiency, reliability, and resource isolation through multi‑warehouse support.

Lakehouse Analysis and Trino Compatibility

StarRocks provides a unified catalog mechanism, supporting external tables via JDBC or Paimon, and offers native Parquet/ORC readers, materialized views, and seamless Trino‑compatible syntax for effortless migration.

Extreme Lakehouse Performance

StarRocks delivers up to three‑fold query speedup over Trino on OSS, with block cache and materialized view optimizations further accelerating workloads.

EMR Serverless StarRocks Overview

StarRocks Serverless builds on the same core features—vectorization, CBO, materialized views, and storage‑compute separation—while adding elastic scaling, change‑data‑feed capabilities, and tighter integration with Alibaba Cloud’s data lake ecosystem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Serverless Database StarRocks OLAP vectorization Lakehouse CBO

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.