Tagged articles
29 articles
Page 1 of 1
ITPUB
ITPUB
Jan 29, 2026 · Big Data

How to Sync MySQL ALTER DDL to Doris Using Flink CDC (Step‑by‑Step)

This guide explains how to extend a Flink CDC pipeline so that, in addition to real‑time data replication, DDL ALTER statements from MySQL are captured, split from the data stream, and applied to Doris using side‑outputs and a custom JDBC sink.

DDL synchronizationFlink CDC
0 likes · 8 min read
How to Sync MySQL ALTER DDL to Doris Using Flink CDC (Step‑by‑Step)
DataFunSummit
DataFunSummit
Apr 8, 2025 · Big Data

Huolala’s Real‑Time Data Synchronization with Flink CDC: Architecture, Practices, and Future Outlook

This article presents Huolala’s end‑to‑end implementation of Flink CDC for real‑time data capture, detailing the business background, reasons for selecting Flink CDC over Canal, component comparisons, production‑level platform enhancements, data‑lake integration, validation methods, and future directions for unified data ingestion.

Flink CDCdata synchronizationreal-time data
0 likes · 13 min read
Huolala’s Real‑Time Data Synchronization with Flink CDC: Architecture, Practices, and Future Outlook
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 23, 2025 · Big Data

How Alibaba Cloud DataWorks Leverages Flink CDC for Scalable Data Lake Integration

Alibaba Cloud DataWorks’ Data Integration platform, built on Flink CDC, offers a comprehensive, serverless solution for real‑time and batch data lake ingestion, detailing its architecture, elastic scaling, productized use cases, and future roadmap, including AI‑driven diagnostics and expanded source support.

Big DataData IntegrationData Lake
0 likes · 12 min read
How Alibaba Cloud DataWorks Leverages Flink CDC for Scalable Data Lake Integration
Huolala Tech
Huolala Tech
Nov 7, 2024 · Big Data

How HuoLaLa Scaled Real‑Time Data Capture with Flink CDC: Architecture, Challenges, and Results

This article details HuoLaLa's logistics platform challenges with petabyte‑scale data, the selection of Apache Flink CDC for stable, compatible, and low‑latency data ingestion, the construction of a multi‑layer CDC capability, migration strategies, measurable performance gains, and future open‑source contributions.

Apache FlinkFlink CDCdata ingestion
0 likes · 15 min read
How HuoLaLa Scaled Real‑Time Data Capture with Flink CDC: Architecture, Challenges, and Results
DataFunTalk
DataFunTalk
Dec 15, 2023 · Big Data

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

The Flink Forward Asia 2023 conference showcased major updates to Apache Flink (versions 1.17 and 1.18), introduced the Apache Paimon lakehouse project, announced Flink CDC 3.0, and highlighted community growth, cloud‑native deployments, and real‑time data‑warehouse use cases across industry leaders.

Apache FlinkApache PaimonBig Data
0 likes · 17 min read
Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 22, 2023 · Big Data

Real-Time Data Integration with Flink CDC: Core Tech and Alibaba Cloud Solutions

This article, based on a presentation by Flink CDC and Apache Flink community leaders, explores CDC real‑time integration challenges, delves into Flink CDC’s core technologies such as incremental snapshot and lock‑free processing, and demonstrates Alibaba Cloud’s enterprise‑grade solutions for end‑to‑end real‑time data pipelines.

Alibaba CloudBig DataChange Data Capture
0 likes · 21 min read
Real-Time Data Integration with Flink CDC: Core Tech and Alibaba Cloud Solutions
DataFunSummit
DataFunSummit
Aug 4, 2023 · Big Data

LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap

This article introduces LakeSoul, an open‑source end‑to‑end real‑time lakehouse framework, detailing its design philosophy, key technologies such as ELT, metadata management, upsert and merge‑on‑read capabilities, performance benchmarks, real‑world use cases, and the roadmap for future enhancements.

Big DataData LakehouseELT
0 likes · 18 min read
LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap
DataFunTalk
DataFunTalk
Jan 20, 2023 · Big Data

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

This article introduces Flink CDC, explains its incremental snapshot algorithm and the 2.0 framework design, compares it with traditional CDC pipelines, discusses the core API and dialect concept, and outlines community growth and future plans, providing a comprehensive technical overview for data engineers.

Apache FlinkBig DataChange Data Capture
0 likes · 13 min read
Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework
Big Data Technology Architecture
Big Data Technology Architecture
Jun 9, 2022 · Databases

Building a Real‑Time Data Warehouse with Apache Doris: Architecture, Benefits, and Lessons Learned

This article details how a fast‑growing supply‑chain platform migrated from MySQL and Hive to Apache Doris for real‑time analytics, describing the architectural evolution, the advantages of the new design, practical implementation steps, encountered challenges, and the performance and cost benefits achieved.

Apache DorisData IntegrationFlink CDC
0 likes · 12 min read
Building a Real‑Time Data Warehouse with Apache Doris: Architecture, Benefits, and Lessons Learned
StarRocks
StarRocks
Jun 2, 2022 · Big Data

Simplify Real‑Time Data Warehousing with Flink CDC and StarRocks

This article explores how combining Flink CDC with StarRocks can streamline real‑time data pipelines, reduce component complexity, support both full and incremental synchronization, and enable efficient OLAP queries and updates for fast, scalable analytics across diverse business scenarios.

Data WarehouseFlink CDCOLAP
0 likes · 18 min read
Simplify Real‑Time Data Warehousing with Flink CDC and StarRocks
Bilibili Tech
Bilibili Tech
Apr 25, 2022 · Big Data

Optimizing Full Partition Tables with Zipper Tables, Hudi+Flink CDC, and Data Warehouse Strategies

Facing server‑hardware constraints, Bilibili’s data platform replaced wasteful full‑partition tables with a zipper‑table approach—preserving change history while cutting storage from petabytes to terabytes—and complemented it with Hudi + Flink CDC for near‑real‑time updates, dramatically lowering I/O, compute usage and latency.

Big DataFlink CDCHudi
0 likes · 11 min read
Optimizing Full Partition Tables with Zipper Tables, Hudi+Flink CDC, and Data Warehouse Strategies
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 2, 2022 · Big Data

What’s New in Flink CDC 2.2? A Deep Dive into Added Sources and Core Features

The article introduces Flink CDC 2.2, highlighting its expanded support for twelve data sources—including OceanBase, PolarDB‑X, SqlServer, and TiDB—while detailing core features such as the incremental snapshot framework, multi‑version Flink compatibility, dynamic table addition, and numerous bug fixes and performance improvements.

Apache FlinkChange Data CaptureConnector
0 likes · 9 min read
What’s New in Flink CDC 2.2? A Deep Dive into Added Sources and Core Features
DataFunTalk
DataFunTalk
Jan 11, 2022 · Big Data

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

In an exclusive InfoQ interview, Apache Flink community leader Wang Feng (aka Mo Wen) outlines the evolution of Flink toward a Streaming Warehouse, detailing recent technical advances, use‑case scenarios, and the upcoming Dynamic Table storage that aim to unify stream and batch processing for real‑time data‑warehouse workloads.

Apache FlinkBig DataDynamic Table
0 likes · 16 min read
Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses
Programmer DD
Programmer DD
Jan 8, 2022 · Big Data

How Flink’s Streaming Warehouse Is Redefining Real‑Time Data Lakes

This interview explores Apache Flink’s evolution toward a Streaming Warehouse, detailing its stream‑batch integration, new CDC‑based data integration, the Dynamic Table storage architecture, and how these innovations aim to simplify and accelerate real‑time big‑data analytics.

Apache FlinkBig DataDynamic Table
0 likes · 17 min read
How Flink’s Streaming Warehouse Is Redefining Real‑Time Data Lakes
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 21, 2021 · Big Data

Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal

This article provides a detailed comparison of three popular open‑source change data capture tools—Debezium, Flink CDC, and Canal—covering their underlying principles, architecture, deployment options, performance characteristics, and suitability for real‑time data synchronization in big‑data environments.

CDCCanalChange Data Capture
0 likes · 21 min read
Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal
Big Data Technology Architecture
Big Data Technology Architecture
Aug 17, 2021 · Big Data

Detailed Overview of Flink CDC 2.0: Architecture, Features, and Future Roadmap

This article provides an in‑depth technical overview of Flink CDC 2.0, covering its CDC fundamentals, comparison of query‑based and log‑based approaches, the new lock‑free chunk algorithm, FLIP‑27 based parallel snapshot reading, performance benchmarks, documentation improvements, and future roadmap for stability and ecosystem integration.

Change Data CaptureData IntegrationDebezium
0 likes · 16 min read
Detailed Overview of Flink CDC 2.0: Architecture, Features, and Future Roadmap