Big Data 16 min read

How Ele.me Built a Real‑Time Lakehouse: From 1.0 to 3.0 with Flink, Paimon & StarRocks

This article details Ele.me's journey in evolving its real‑time data warehouse, covering the original 1.0 architecture, the 2.0 lakehouse redesign with Paimon and StarRocks, performance evaluations of lake formats and query engines, and the roadmap toward a 3.0 streaming lakehouse solution.

StarRocks

Sep 19, 2024

Evolution of Ele.me Real‑Time Warehouse

Typical real‑time scenarios include ETL, reporting, integration with online services, and monitoring. The platform is organized into three layers:

Data collection : binlog capture via DataX/DRC, user‑behavior logs via the internal Omni platform, and application logs via SLS and TT.

Data processing : near‑real‑time lakehouse stored with Paimon on OSS; ultra‑low‑latency data kept in TT and SLS; compute powered by Dataphin, VVP and the Flink + Blink stack.

Data service : core storage uses ADB and Hologres, with StarRocks added for lake‑warehouse integration. Internal data‑service applications expose the resulting data products.

Compute migrated from Blink to Flink (complete migration planned for 2023) and storage shifted from ADB to Hologres, while StarRocks is introduced to improve development efficiency.

Version 1.0 – Lambda Architecture

Data flow follows ODS → DWD → independent ADS layers. Historical data are loaded into an OLAP engine via a T+1 batch pipeline. Identified problems:

Duplicated development effort across ADS.

Slow adaptation to business changes, causing data‑consistency issues.

High operational and compute costs.

Goals were faster, more accurate, stable, and consistent data with higher development and operational efficiency.

Version 2.0 – CDM‑Driven Architecture

A CDM layer materializes common dimensions and metrics as DWS assets using Dataphin. Dataphin translates declarative SQL into Flink streaming and batch jobs, schedules them via DataWorks (D2), and writes results back to Hologres. Downstream ADS layers perform simple transformations on these unified assets.

Benefits: improved data consistency and reduced development effort. Remaining challenges:

New business scenarios may not fit the existing pipeline.

Dual storage in TT and ODPS prevents a pure stream‑batch integration.

TT lacks update capability, column pruning, and incurs high per‑column costs; Hologres storage cost is also high.

Lake Format and Query Engine Evaluation

Benchmark data from core transaction, marketing, and traffic domains were written to Paimon and Hudi. Metrics included write amplification, streaming read/write throughput, end‑to‑end latency, and query cost. Results showed: Paimon end‑to‑end latency ≈1‑5 min (average ≈3 min) versus ≈10 min for Hudi.

Performance of Paimon approached that of native OLAP internal tables.

Query engines were tested on Alibaba Cloud EMR 5.15.1 (≈200 CU, StarRocks 192 CU). StarRocks consistently outperformed Trino thanks to a JNI connector optimized for Paimon, filter push‑down, a vectorized execution engine, and support for read‑only Paimon tables.

Conclusion: select Paimon as the lake storage format and StarRocks as the OLAP engine.

Streaming Lakehouse Architecture (Version 3.0)

Flink handles stream read/write on Paimon tables stored in an internal OSS cluster; metadata is managed by Data Lake Formation (DLF). StarRocks materialized views provide aggregation layers, while Hologres external tables enable ad‑hoc self‑service analytics. Core use cases include traffic insight, real‑time transaction subsidy analysis, and service‑level dashboards.

Open Challenges and Future Work

OSS bandwidth bottlenecks and small‑file overhead. Paimon latency (1‑5 min) limits multi‑layer dependencies.

Metadata integration between DLF and existing internal systems.

Insufficient permission controls.

Planned improvements:

Asynchronous compaction and deletion‑vector support for Paimon to reduce write‑amplification and latency.

Deeper integration with DataWorks and MaxCompute for unified job orchestration.

Hot‑cold storage tiering on OSS to lower storage costs.

Enhance StarRocks materialized‑view capabilities and query‑cache mechanisms.

Big Data Flink Streaming lakehouse

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Evolution of Ele.me Real‑Time Warehouse

Version 1.0 – Lambda Architecture

Version 2.0 – CDM‑Driven Architecture

Lake Format and Query Engine Evaluation

Streaming Lakehouse Architecture (Version 3.0)

Open Challenges and Future Work

StarRocks

How this landed with the community

Was this worth your time?

0 Comments

Version 1.0 – Lambda Architecture

Version 2.0 – CDM‑Driven Architecture

Streaming Lakehouse Architecture (Version 3.0)