Tagged articles
11 articles
Page 1 of 1
DataFunSummit
DataFunSummit
Dec 1, 2025 · Big Data

7 Cutting-Edge Data Engineering Practices Shaping AI-Driven Data Lakes

This article collection showcases seven advanced data engineering solutions—from Tencent Cloud's Iceberg batch‑stream integration and Apache Gravitino metadata lineage to Xiaohongshu's Lakehouse evolution and multimodal AI data lake implementations—highlighting architectural innovations, performance optimizations, and real‑world deployment insights for modern big‑data platforms.

Apache GravitinoApache IcebergBatch-Stream Integration
0 likes · 7 min read
7 Cutting-Edge Data Engineering Practices Shaping AI-Driven Data Lakes
High Availability Architecture
High Availability Architecture
Sep 10, 2025 · Big Data

How Ctrip Business Travel Built a Near‑Real‑Time Lakehouse with Flink CDC & Paimon

This article details Ctrip Business Travel’s implementation of a near‑real‑time data warehouse using Flink CDC and the Paimon lakehouse engine, covering order wide‑table construction, ticket refund alerts, ad attribution, batch‑stream integration, and practical lessons on Partial Update, Aggregation, and Tag‑based incremental processing.

?=Batch-Stream IntegrationFlink
0 likes · 17 min read
How Ctrip Business Travel Built a Near‑Real‑Time Lakehouse with Flink CDC & Paimon
Ctrip Technology
Ctrip Technology
Sep 2, 2025 · Big Data

How Ctrip Built a Near‑Real‑Time Lakehouse with Flink & Paimon

This article details Ctrip Business Travel’s implementation of a near‑real‑time data warehouse and lakehouse using Flink CDC and Apache Paimon, covering order wide‑table construction, automated ticket reminders, ad attribution, batch‑stream integration, and lessons on Partial Update, Aggregation, and Tag‑based incremental processing.

Batch-Stream IntegrationFlinkLakehouse
0 likes · 17 min read
How Ctrip Built a Near‑Real‑Time Lakehouse with Flink & Paimon
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 18, 2024 · Big Data

Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse

The article reviews the major directions of Flink 2.0—including compute‑storage separation, a new Materialized Table for unified batch‑stream processing, and deeper integration with Paimon for streaming warehouses—while offering a cautious perspective on their practical impact and migration challenges.

Batch-Stream IntegrationBig DataCompute-Storage Separation
0 likes · 5 min read
Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse
dbaplus Community
dbaplus Community
Sep 3, 2023 · Big Data

How NetEase Yanxuan Migrated from Lambda to Iceberg for Seamless Batch‑Stream Integration

This article explains how NetEase Yanxuan upgraded its legacy Lambda architecture to an Iceberg‑based batch‑stream unified platform, detailing the original data pipeline, the challenges faced, the evaluation of Iceberg versus Hudi and DeltaLake, and the concrete engineering optimizations and governance measures implemented to achieve lower latency and higher query performance.

Batch-Stream IntegrationBig DataFlink
0 likes · 14 min read
How NetEase Yanxuan Migrated from Lambda to Iceberg for Seamless Batch‑Stream Integration

How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration

This article details how NetEase Yanxuan transformed its data platform from a dual Lambda architecture to a unified batch‑stream solution built on Apache Iceberg, covering the original challenges, the evaluation of Iceberg versus Hudi and Delta Lake, implementation of stream‑batch pipelines, message ordering fixes, snapshot generation, and extensive table‑governance optimizations.

Apache FlinkApache SparkBatch-Stream Integration
0 likes · 14 min read
How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration
DataFunTalk
DataFunTalk
Jun 6, 2021 · Big Data

Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink

This article explains Apache Pulsar’s cloud‑native, storage‑compute separated architecture, its data model and scalability features, and how it integrates with Flink to provide a unified platform for both real‑time streaming and batch processing in big‑data applications.

Apache PulsarBatch-Stream IntegrationBig Data
0 likes · 17 min read
Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink
Big Data Technology Architecture
Big Data Technology Architecture
Apr 5, 2021 · Big Data

Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture

The article reviews the current state of offline Hive‑based data warehouses, explains the emergence of real‑time data warehouses (1.0) built on Kafka and Flink, discusses their limitations, and outlines the progression toward batch‑stream unified architectures (2.0 and 3.0) leveraging data‑lake technologies such as Iceberg.

Batch-Stream IntegrationBig DataFlink
0 likes · 13 min read
Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture
TAL Education Technology
TAL Education Technology
Jan 28, 2021 · Big Data

Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices

This article, presented by senior data platform engineer Mao Xiangyi of TAL Education, details the design and implementation of the company’s real‑time T‑Streaming platform, covering its three‑layer data architecture, batch‑stream integration techniques, ODS layer real‑timeization, Flink SQL development workflow, hybrid‑cloud deployment, and a case study of K‑12 renewal reporting.

Batch-Stream IntegrationEducation AnalyticsFlink
0 likes · 18 min read
Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices
DataFunTalk
DataFunTalk
Dec 7, 2020 · Big Data

Jingdong's Flink Real‑Time Computing Platform: Containerization, Optimizations, and Future Roadmap

This article details Jingdong's evolution from Storm to Flink, the architecture of its Kubernetes‑based real‑time computing platform, extensive containerization practices, performance and stability optimizations, and the future plan to unify batch‑stream processing while expanding SQL support and intelligent operations.

Batch-Stream IntegrationFlinkKubernetes
0 likes · 16 min read
Jingdong's Flink Real‑Time Computing Platform: Containerization, Optimizations, and Future Roadmap