Big Data 15 min read

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, detailing its architecture, data quality assurance, stream‑batch integration, and future data lake implementation, while highlighting the use of Flink, ODPS, and Paimon for scalable, low‑latency analytics.

DataFunSummit
DataFunSummit
DataFunSummit
Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

The presentation outlines Ant Group's real-time data warehouse architecture, which comprises six core modules: compute engine, development platform, compute resources, real-time assets, development tools, and data quality, addressing challenges such as asset management, resource control, and platform robustness.

It describes the real-time data solution, including data source ingestion (logs, database logs, real-time messages), processing engines (Flink and ODPS), storage layers (SLS, Explorer, HBase), and the integration of low‑code development and stream‑batch unified processing to improve development efficiency and consistency.

Data quality assurance is tackled in two stages: pre‑deployment testing and runtime monitoring, covering task health (latency, failover, checkpoints) and data integrity checks (zero‑value, variance, threshold alerts), with a full‑link baseline for end‑to‑end timeliness monitoring.

The stream‑batch unified approach addresses field alignment between streaming and batch tables, employing virtual columns and hybrid meta‑tables to handle mismatched schemas, enabling a single task to compute real‑time cumulative metrics by combining daily aggregates with historical offline results.

Looking ahead, Ant Group plans to consolidate real‑time and offline data into a data lake using Paimon, simplifying the pipeline with a unified compute engine, storage, and asset management, and supporting multi‑level scheduling (daily, hourly, real‑time) for comprehensive data needs.

Big Dataflinkstream processingData qualitydata warehousereal-time dataData Lake
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.