Tag

Flink CDC

1 views collected around this technical thread.

DataFunSummit
DataFunSummit
Apr 8, 2025 · Big Data

Huolala’s Real‑Time Data Synchronization with Flink CDC: Architecture, Practices, and Future Outlook

This article presents Huolala’s end‑to‑end implementation of Flink CDC for real‑time data capture, detailing the business background, reasons for selecting Flink CDC over Canal, component comparisons, production‑level platform enhancements, data‑lake integration, validation methods, and future directions for unified data ingestion.

Big DataFlink CDCStreaming
0 likes · 13 min read
Huolala’s Real‑Time Data Synchronization with Flink CDC: Architecture, Practices, and Future Outlook
Aikesheng Open Source Community
Aikesheng Open Source Community
Mar 4, 2025 · Databases

Troubleshooting Incremental Data Sync Failure in OceanBase OBLogProxy Binlog Mode

This article details the background, configuration, and step‑by‑step troubleshooting process for a data pipeline that replaces a MySQL source with OceanBase using OBLogProxy in binlog mode, explains why downstream reads missed incremental data, and provides conclusions and optimization measures.

Database TroubleshootingFlink CDCOBLogProxy
0 likes · 12 min read
Troubleshooting Incremental Data Sync Failure in OceanBase OBLogProxy Binlog Mode
DataFunSummit
DataFunSummit
Feb 9, 2025 · Big Data

Modern Data Stack on Alibaba Cloud Using Flink CDC: Architecture, Features, and Use Cases

This article presents a comprehensive overview of Alibaba Cloud's modern data stack built on Flink CDC, detailing its core concepts, extended capabilities, typical application scenarios, performance optimizations, a live demo, and future development plans for large‑scale streaming data integration.

Alibaba CloudBig DataFlink CDC
0 likes · 13 min read
Modern Data Stack on Alibaba Cloud Using Flink CDC: Architecture, Features, and Use Cases
DataFunTalk
DataFunTalk
Dec 15, 2023 · Big Data

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

The Flink Forward Asia 2023 conference showcased major updates to Apache Flink (versions 1.17 and 1.18), introduced the Apache Paimon lakehouse project, announced Flink CDC 3.0, and highlighted community growth, cloud‑native deployments, and real‑time data‑warehouse use cases across industry leaders.

Apache FlinkApache PaimonBig Data
0 likes · 17 min read
Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0
DataFunTalk
DataFunTalk
Dec 12, 2023 · Big Data

Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates

The Flink Forward Asia 2023 conference recap highlights opening remarks, a keynote on Flink’s dominance in streaming compute, detailed 2023 technical advancements, case studies, the launch of Flink CDC 3.0, and a preview of Flink 2.0, along with links to photos and video recordings.

Apache FlinkBig DataFlink 2.0
0 likes · 5 min read
Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates
DataFunSummit
DataFunSummit
Aug 4, 2023 · Big Data

LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap

This article introduces LakeSoul, an open‑source end‑to‑end real‑time lakehouse framework, detailing its design philosophy, key technologies such as ELT, metadata management, upsert and merge‑on‑read capabilities, performance benchmarks, real‑world use cases, and the roadmap for future enhancements.

Big DataData LakehouseELT
0 likes · 18 min read
LakeSoul: An Open‑Source Real‑Time Data Lakehouse Framework – Design, Architecture, Benchmarks and Future Roadmap
DataFunSummit
DataFunSummit
Jun 12, 2023 · Big Data

From Data Integration to the Modern Data Stack: Concepts, Tools, and Practices

This article explains data integration fundamentals, compares data integration tools such as Stitch, Fivetran, and Airbyte, describes the concepts of data warehouses and data lakes, outlines ETL vs ELT processes, and explores building modern data stacks with Flink CDC and cloud services.

Big DataELTETL
0 likes · 17 min read
From Data Integration to the Modern Data Stack: Concepts, Tools, and Practices
DataFunTalk
DataFunTalk
Jan 20, 2023 · Big Data

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

This article introduces Flink CDC, explains its incremental snapshot algorithm and the 2.0 framework design, compares it with traditional CDC pipelines, discusses the core API and dialect concept, and outlines community growth and future plans, providing a comprehensive technical overview for data engineers.

Apache FlinkBig DataChange Data Capture
0 likes · 13 min read
Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework
Big Data Technology Architecture
Big Data Technology Architecture
Jun 9, 2022 · Databases

Building a Real‑Time Data Warehouse with Apache Doris: Architecture, Benefits, and Lessons Learned

This article details how a fast‑growing supply‑chain platform migrated from MySQL and Hive to Apache Doris for real‑time analytics, describing the architectural evolution, the advantages of the new design, practical implementation steps, encountered challenges, and the performance and cost benefits achieved.

Apache DorisBig DataFlink CDC
0 likes · 12 min read
Building a Real‑Time Data Warehouse with Apache Doris: Architecture, Benefits, and Lessons Learned
Bilibili Tech
Bilibili Tech
Apr 25, 2022 · Big Data

Optimizing Full Partition Tables with Zipper Tables, Hudi+Flink CDC, and Data Warehouse Strategies

Facing server‑hardware constraints, Bilibili’s data platform replaced wasteful full‑partition tables with a zipper‑table approach—preserving change history while cutting storage from petabytes to terabytes—and complemented it with Hudi + Flink CDC for near‑real‑time updates, dramatically lowering I/O, compute usage and latency.

Big DataData WarehouseFlink CDC
0 likes · 11 min read
Optimizing Full Partition Tables with Zipper Tables, Hudi+Flink CDC, and Data Warehouse Strategies
DataFunTalk
DataFunTalk
Jan 11, 2022 · Big Data

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

In an exclusive InfoQ interview, Apache Flink community leader Wang Feng (aka Mo Wen) outlines the evolution of Flink toward a Streaming Warehouse, detailing recent technical advances, use‑case scenarios, and the upcoming Dynamic Table storage that aim to unify stream and batch processing for real‑time data‑warehouse workloads.

Apache FlinkBig DataDynamic Table
0 likes · 16 min read
Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses
Big Data Technology Architecture
Big Data Technology Architecture
Aug 17, 2021 · Big Data

Detailed Overview of Flink CDC 2.0: Architecture, Features, and Future Roadmap

This article provides an in‑depth technical overview of Flink CDC 2.0, covering its CDC fundamentals, comparison of query‑based and log‑based approaches, the new lock‑free chunk algorithm, FLIP‑27 based parallel snapshot reading, performance benchmarks, documentation improvements, and future roadmap for stability and ecosystem integration.

Big DataChange Data CaptureDebezium
0 likes · 16 min read
Detailed Overview of Flink CDC 2.0: Architecture, Features, and Future Roadmap