How Lalamove Scaled Real‑Time Data Warehousing with Flink and Paimon
Lalamove’s international logistics platform transformed its real‑time data warehouse by leveraging Apache Flink and the Paimon lakehouse, addressing challenges of multi‑region data centers, time‑zone diversity, frequent upstream changes, and high costs, while improving scalability, latency, and operational efficiency across global markets.
This article, authored by senior data‑warehouse engineer Lin Hailiang from Lalamove International Technology, presents the business background, Flink applications and challenges, the evolution of a Flink‑driven real‑time data‑warehouse architecture, and future outlook.
1. Business Overview
Lalamove, founded in 2013, is an on‑demand delivery platform operating in 13 markets across Asia, Latin America, Europe, the Middle East and Africa. It connects users, drivers and goods through a mobile app, providing fast, flexible and cost‑effective logistics services.
2. Flink in Business and Challenges
2.1 Real‑time Link in Lalamove
Real‑time Dashboard: Provides comprehensive, multi‑dimensional KPI displays for management and operations, supporting per‑region, per‑price‑category and per‑vehicle‑type analysis with diagnostic capabilities.
Data Service API: Uses Flink’s streaming to compute dynamic surcharges, real‑time coupons and order risk control, enhancing order flexibility and intelligence.
Data Monitoring: Streams metrics to the in‑house LL‑Monitor platform for real‑time health checks of both tasks and data quality.
Data Analysis: Processes streamed data with Flink and queries it via Doris OLAP, enabling fresh data‑driven insights.
2.2 Technical Challenges from Business Expansion
Multiple data centers across regions cause data isolation, latency and consistency issues.
Operations span eight time zones, requiring event‑time handling and careful data‑skew mitigation.
Frequent upstream schema changes (from a single wide table to over 20 fact tables) demand a flexible data model.
Higher overseas cloud costs and cross‑DC traffic fees necessitate careful cost‑performance trade‑offs.
3. Evolution of the Flink‑Driven Real‑time Warehouse
3.1 Architecture Milestones
Pre‑2022: Single‑layer, wide‑table design on Flink 1.9 caused massive state size, duplicated computations and costly maintenance.
2022 (ODS‑DWD‑DM layered): Introduced a three‑layer real‑time warehouse; ODS ingests binlog and event data to Kafka, DWD performs ETL, DM uses Flink TVFs with event‑time windows, reducing resource consumption and handling time‑zone data skew.
2023: Upstream split into three fact tables; multiple DWD tables were created to avoid stream joins; upgraded to Flink 1.18 on Kubernetes with auto‑scaling.
2024: Integrated Paimon lakehouse as both sink and source; leveraged partial‑update and sequence mechanisms for data ordering; maintained ODS‑to‑Kafka ingestion, DWD built on Paimon, DM kept event‑time processing, and data was served to OLAP and KV stores.
3.2 Current Warehouse Architecture
Data Model: Tightly coupled with Flink and Paimon versions; PRD runs on Flink 1.18 + Paimon 1.0, emphasizing upstream‑downstream collaboration for lightweight models.
Data Processing: Unified development on the proprietary “FeiLiu” platform; ~90% of tasks are FlinkSQL, simplifying maintenance.
Data Monitoring: Real‑time metrics and business indicators are reported to LL‑Monitor (based on Prometheus) for rapid issue detection.
Data Storage: Paimon stores data in object storage; OLAP accelerates queries, KV databases support millisecond‑level API responses, and Kafka enables multi‑consumer distribution.
3.3 Benefits of the Lakehouse
Enables unified real‑time and batch reporting, allowing offline reports to read the same Paimon data as real‑time dashboards.
Improves development efficiency by moving compatibility concerns to the DWD layer, reducing downstream rewrites.
Reduces compute resources: Paimon‑based pipelines consume far less Flink resources than the previous Kafka‑centric design, even as market coverage grew from 5 to 8 time zones.
4. Future Outlook
While Paimon has delivered significant gains, its file‑merge process introduces second‑level latency and maintaining Kafka and third‑party OLAP components incurs overhead. Lalamove plans to explore Fluss + Paimon integration to achieve tighter lake‑stream fusion, CDC subscription, millisecond‑level latency, and reduced reliance on external OLAP engines.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
