How Ele.me Revolutionized Real‑Time Data Warehousing with Flink‑Paimon Lakehouse
In this detailed case study, Alibaba’s Ele.me team explains how they evolved from siloed, chimney‑style real‑time warehouses to a unified Flink‑Paimon lakehouse, highlighting the three development stages, technology evaluations, the Alake platform’s one‑stop capabilities, production results, and future directions such as Fluss and AI integration.
Introduction
Amid the wave of digital transformation, enterprises increasingly demand real‑time data processing. Traditional real‑time data warehouse architectures face problems such as data silos, high costs, and low development efficiency when dealing with rapid business changes and explosive data growth.
Real‑Time Warehouse Evolution
1.1 Current Architecture of Ele.me (Taobao Flash Sale)
Ele.me, a key local‑life service platform of Alibaba Group, generates massive multi‑dimensional data (orders, user behavior, merchant operations) daily. Its architecture has evolved from fragmented to centralized and from chimney‑style to platform‑based.
1.2 Three Development Stages of Real‑Time Warehouses
Real‑Time Warehouse 1.0 – Chimney‑Style Repeated Development
Each business line built independent real‑time pipelines, causing severe data silos, inconsistent standards, duplicated infrastructure, and high operational costs.
Real‑Time Warehouse 2.0 – Initial Integration and New Challenges
Data‑mid platform construction improved data consistency and reduced costs, but introduced “pseudo” stream‑batch integration, duplicated storage for DWD layers, bandwidth bottlenecks in TT, limited performance of Hologres, and poor debugging experience.
1.3 Lakehouse Exploration and Technology Selection
To overcome 2.0 limitations, Ele.me explored a lakehouse architecture, conducting extensive cloud‑EMR evaluations of storage formats and OLAP engines.
Key Technology Comparisons
Lake Storage Format: Paimon vs Hudi
Paimon outperformed Hudi in end‑to‑end latency, stream update stability, and write‑amplification control, making it more suitable for real‑time scenarios.
OLAP Engine Performance
StarRocks and Hologres delivered comparable query performance, both significantly better than Trino. Features like Deletion Vector and Data Cache allowed queries on Paimon external tables to approach internal‑table performance.
Compatibility Challenges
Integrating Flink + Paimon + StarRocks/Hologres with existing on‑premise deployments and cloud EMR proved difficult, highlighting the need for a unified development platform.
Alake Platform Capabilities
2.1 Background and Value
Alake, an internal Alibaba project, drives the transition from traditional data warehouses to lakehouses and further to Data + AI platforms, already adopted by multiple business units.
2.2 Core Features
One‑Stop Development Platform
Based on DataWorks, Alake provides a consistent stack that simplifies migration to lakehouse architectures.
Unified Compute Resource Management
Resources for Spark, Flink, StarRocks, etc., are centrally managed, enabling dynamic allocation (e.g., shifting CU from Spark to Flink) and improving utilization.
Unified Lake Storage Format
Built on Paimon and Pangu, the storage layer eliminates data migration and silo issues, supporting true separation of storage and compute.
Data Lake Metadata Management (DLF)
DLF offers seamless integration with existing security, permission systems, and ODPS metadata, enabling cross‑system data flow.
Production Practice
3.1 Overall Architecture Design
The production pipeline combines Paimon‑based streaming ETL (minute‑level latency), SR/Hologres external tables for low‑latency ad‑hoc analysis, and Spark/ODPS batch processing for traditional BI.
3.2 Comparison: Traditional Real‑Time Warehouse vs Lakehouse
Data Consistency & Storage Optimization
Lakehouse reduces data redundancy and storage costs while improving consistency.
Timeliness & Development Efficiency
Although lakehouse latency shifts from seconds to minutes, it dramatically lowers development barriers and supports multiple query engines.
3.3 Production Scale & Stability
Ele.me operates over 150,000 CU, handling both streaming and batch workloads, and has proven stability during high‑traffic promotional events.
Future Planning and Technical Outlook
4.1 Future Directions
Focus areas include true stream‑batch integration, intelligent data services, deep AI‑lakehouse fusion, and an open ecosystem.
4.2 Fluss Technology Introduction
Fluss aims to replace the TT solution and, combined with Paimon, will enable genuine stream‑batch unification.
4.3 Lakehouse & AI Integration
Current notebook environments support basic data science; deeper AI capabilities are planned to enhance predictive analytics.
Conclusion
Ele.me’s Flink + Paimon lakehouse production practice marks a pivotal milestone in digital transformation, delivering higher data consistency, lower storage costs, improved development efficiency, and robust stability at massive scale. The case provides valuable reference for the industry.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
