Big Data 17 min read

How Ele.me Revolutionized Real‑Time Data Warehousing with Flink‑Paimon Lakehouse

In this detailed case study, Alibaba’s Ele.me team explains how they evolved from siloed, chimney‑style real‑time warehouses to a unified Flink‑Paimon lakehouse, highlighting the three development stages, technology evaluations, the Alake platform’s one‑stop capabilities, production results, and future directions such as Fluss and AI integration.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Ele.me Revolutionized Real‑Time Data Warehousing with Flink‑Paimon Lakehouse

Introduction

Amid the wave of digital transformation, enterprises increasingly demand real‑time data processing. Traditional real‑time data warehouse architectures face problems such as data silos, high costs, and low development efficiency when dealing with rapid business changes and explosive data growth.

Real‑Time Warehouse Evolution

1.1 Current Architecture of Ele.me (Taobao Flash Sale)

Ele.me, a key local‑life service platform of Alibaba Group, generates massive multi‑dimensional data (orders, user behavior, merchant operations) daily. Its architecture has evolved from fragmented to centralized and from chimney‑style to platform‑based.

1.2 Three Development Stages of Real‑Time Warehouses

Real‑Time Warehouse 1.0 – Chimney‑Style Repeated Development

Each business line built independent real‑time pipelines, causing severe data silos, inconsistent standards, duplicated infrastructure, and high operational costs.

Real‑Time Warehouse 2.0 – Initial Integration and New Challenges

Data‑mid platform construction improved data consistency and reduced costs, but introduced “pseudo” stream‑batch integration, duplicated storage for DWD layers, bandwidth bottlenecks in TT, limited performance of Hologres, and poor debugging experience.

1.3 Lakehouse Exploration and Technology Selection

To overcome 2.0 limitations, Ele.me explored a lakehouse architecture, conducting extensive cloud‑EMR evaluations of storage formats and OLAP engines.

Key Technology Comparisons

Lake Storage Format: Paimon vs Hudi

Paimon outperformed Hudi in end‑to‑end latency, stream update stability, and write‑amplification control, making it more suitable for real‑time scenarios.

OLAP Engine Performance

StarRocks and Hologres delivered comparable query performance, both significantly better than Trino. Features like Deletion Vector and Data Cache allowed queries on Paimon external tables to approach internal‑table performance.

Compatibility Challenges

Integrating Flink + Paimon + StarRocks/Hologres with existing on‑premise deployments and cloud EMR proved difficult, highlighting the need for a unified development platform.

Alake Platform Capabilities

2.1 Background and Value

Alake, an internal Alibaba project, drives the transition from traditional data warehouses to lakehouses and further to Data + AI platforms, already adopted by multiple business units.

2.2 Core Features

One‑Stop Development Platform

Based on DataWorks, Alake provides a consistent stack that simplifies migration to lakehouse architectures.

Unified Compute Resource Management

Resources for Spark, Flink, StarRocks, etc., are centrally managed, enabling dynamic allocation (e.g., shifting CU from Spark to Flink) and improving utilization.

Unified Lake Storage Format

Built on Paimon and Pangu, the storage layer eliminates data migration and silo issues, supporting true separation of storage and compute.

Data Lake Metadata Management (DLF)

DLF offers seamless integration with existing security, permission systems, and ODPS metadata, enabling cross‑system data flow.

Production Practice

3.1 Overall Architecture Design

The production pipeline combines Paimon‑based streaming ETL (minute‑level latency), SR/Hologres external tables for low‑latency ad‑hoc analysis, and Spark/ODPS batch processing for traditional BI.

3.2 Comparison: Traditional Real‑Time Warehouse vs Lakehouse

Data Consistency & Storage Optimization

Lakehouse reduces data redundancy and storage costs while improving consistency.

Timeliness & Development Efficiency

Although lakehouse latency shifts from seconds to minutes, it dramatically lowers development barriers and supports multiple query engines.

3.3 Production Scale & Stability

Ele.me operates over 150,000 CU, handling both streaming and batch workloads, and has proven stability during high‑traffic promotional events.

Future Planning and Technical Outlook

4.1 Future Directions

Focus areas include true stream‑batch integration, intelligent data services, deep AI‑lakehouse fusion, and an open ecosystem.

4.2 Fluss Technology Introduction

Fluss aims to replace the TT solution and, combined with Paimon, will enable genuine stream‑batch unification.

4.3 Lakehouse & AI Integration

Current notebook environments support basic data science; deeper AI capabilities are planned to enhance predictive analytics.

Conclusion

Ele.me’s Flink + Paimon lakehouse production practice marks a pivotal milestone in digital transformation, delivering higher data consistency, lower storage costs, improved development efficiency, and robust stability at massive scale. The case provides valuable reference for the industry.

FlinkPaimonlakehouseAlake
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.