Big Data 8 min read

Unlocking Lakehouse Power: Paimon and Doris Integrated Solutions

This article reviews how Paimon and Doris combine to solve unified storage, data visibility, and performance challenges in modern lakehouse architectures, detailing their complementary features, integration capabilities, and real‑world use cases from leading companies.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Unlocking Lakehouse Power: Paimon and Doris Integrated Solutions

Hello everyone, today we share a learning note on the integration of Paimon and Doris in lakehouse solutions.

Note: We focus on the synergy between Paimon and Doris and the core problems each solves; implementation details vary by scenario.

Lakehouse Integration Scenarios Summary

Based on dozens of company shares, common problems solved by Paimon + Doris include:

Unified storage : In most data‑warehouse solutions, streaming and batch storage are separate (Kafka for streaming, HDFS for batch); Paimon unifies them.

Data visibility : Real‑time data is often invisible to users; extra export jobs add cost and storage.

Others : Issues with Redis dimension tables, inconsistent real‑time DWS layers, etc.

Based on this, companies explore Paimon + Doris solutions:

Paimon serves as a data‑lake storage with open formats compatible with Spark, Flink, Trino, object‑storage scalability (S3, HDFS), native transaction and schema evolution, acting as a unified low‑cost storage base for massive heterogeneous data.

Doris acts as an analytical database with a distributed parallel engine, vectorized execution, and optimized operators for complex aggregations, delivering millisecond‑to‑second low‑latency queries.

Paimon x Doris Capability Integration

According to Xiaomi’s share, Doris adds several features for the Paimon format:

Metadata‑driven partition, bucket pruning and predicate push‑down to improve query efficiency.

Support for Paimon Deletion Vector reads, accelerating updates with a vectorized C++ engine.

Local file caching for hot data.

Time‑travel, incremental reads, and branch/tag reads for multi‑version management.

Materialized views at partition and snapshot levels with strong consistency.

Support for Paimon Rest Catalog (DLF) for unified metadata management.

The low cost and compatibility of Paimon storage complement Doris’s high‑performance query engine, enabling hot‑cold data synergy.

Core Problems Addressed

Paimon Dimension Table Queries

Lookup Join is a key join type in Paimon for streaming queries that associate stream data with dimension tables.

Reference: Lookup Join enhancements in Paimon 1.0.

Paimon dimension tables offer distinct advantages:

Service request : Paimon pulls data locally for Lookup Join, eliminating external requests.

Flexible queries : Supports both point lookups and OLAP queries.

QPS support : Handles hundreds of thousands of QPS with low cost.

Paimon Long‑Term Aggregation

The lakehouse architecture leverages Flink’s state management and Paimon’s incremental updates to achieve efficient UV deduplication and accumulation, ensuring accurate calculations with minute‑level timeliness.

Simply create a primary‑key aggregated table in the DWS layer; Flink tasks convert boolean state to integer 0/1 and write to the aggregated table, producing metrics.

Finally, write the computed metrics to external KV storage for external queries.

Reference: Paimon’s Changelog Producer.

Doris Engine Accelerates Paimon Queries

Doris can directly read externally mounted Paimon tables, using its MPP vectorized engine, C++ native reader, and materialized view with local cache to bring query latency and concurrency to true OLAP levels.

Other

Companies continue to expand capabilities in performance (Paimon dimension tables), association (Partial Update), and OLAP lake writes based on specific business needs.

AnalyticsBig DataPaimondata lakelakehousedoris
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.