Big Data 11 min read

How Paimon + StarRocks Revolutionize Lakehouse Analytics

This article reviews traditional Lambda and Kappa data‑warehouse architectures, then details four Paimon‑StarRocks lakehouse solutions—including a data‑lake center, accelerated query with materialized views, hot‑cold data separation, and the JNI connector—while also outlining StarRocks’ future roadmap for lakehouse analytics.

StarRocks

Sep 6, 2023

Traditional Data Warehouse (Lambda Architecture)

Traditional data‑warehouse analysis follows a Lambda architecture with separate real‑time and batch layers that ingest data via a message queue (e.g., Kafka), duplicate the data stream, and finally merge results in a data‑service layer for user queries.

While Lambda addresses real‑time needs, it incurs higher deployment and maintenance costs because two parallel systems must be kept in sync.

Kappa Architecture

Kappa replaces the dual‑system design with a single streaming pipeline, assuming that re‑processing historical data is unnecessary unless explicitly required, which can lead to resource waste when full replays are needed.

Paimon + StarRocks Lakehouse Solutions

2.1 Data‑Lake Center

StarRocks, an MPP database, can query external data‑lake formats directly. By integrating Paimon as the ODS layer, StarRocks (or Spark) can read Paimon tables, while Paimon provides on‑disk storage, indexing, and Hive compatibility, improving fault tolerance and query capabilities.

2.2 Accelerated Query

This variant lets StarRocks handle the entire analytics stack. After data lands in Paimon (ODS), StarRocks creates external tables to read Paimon data, builds a materialized view for the DWD layer, and nests another materialized view for the DWS layer, delivering fast query performance with simplified operations.

Simplified operations: only StarRocks and Paimon are required.

High query speed: StarRocks’ native indexing, storage, and optimizer outperform other engines.

2.3 Materialized Views

StarRocks supports asynchronous materialized views defined via SQL, offering easy maintenance, pre‑computation to reduce latency, automatic query routing, scheduled or partition‑aware refreshes, and multi‑table construction from internal, external, or existing materialized views.

2.4 Hot‑Cold Data Separation

Hot data (frequently queried) is stored in StarRocks for low‑latency access, while cold data resides in cheaper object storage (OSS/HDFS) via Paimon. Queries on a combined materialized view automatically pull hot data from StarRocks and cold data from Paimon, merging results transparently.

Integration Details

3.1 Paimon External Catalog

StarRocks can create an external catalog for Paimon with a simple

CREATE EXTERNAL CATALOG ... TYPE='PAIMON' PROPERTIES('path'='...');

statement, enabling direct SQL queries on Paimon tables.

3.2 JNI Connector

The JNI connector bridges StarRocks’ C++ backend with Java SDKs of lake‑storage components (e.g., Paimon, Hudi). It reads data via the Java SDK, writes it into off‑heap memory in a format consumable by StarRocks’ BE, and thus allows seamless access to advanced lake‑storage features without native C++ support.

Fast integration of Java data sources.

Simple Java API.

Supports Hudi MOR tables, Paimon tables, and complex types (struct, map, array).

Zero‑intrusion C++ code.

Future Roadmap for StarRocks Lakehouse

Support for complex data types.

Column statistics.

Metadata caching.

Time‑travel queries.

Streaming materialized views based on Paimon external tables.

These enhancements aim to broaden analytical capabilities and improve performance for lakehouse workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

big data streaming StarRocks Paimon Lakehouse materialized view

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Traditional Data Warehouse (Lambda Architecture)

Kappa Architecture

Paimon + StarRocks Lakehouse Solutions

2.1 Data‑Lake Center

2.2 Accelerated Query

2.3 Materialized Views

2.4 Hot‑Cold Data Separation

Integration Details

3.1 Paimon External Catalog

3.2 JNI Connector

Future Roadmap for StarRocks Lakehouse

StarRocks

How this landed with the community

Was this worth your time?

0 Comments

Paimon + StarRocks Lakehouse Solutions