Tag

CDC

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Apr 1, 2025 · Big Data

Understanding Flink CDC 3.3: Features, Improvements, and Future Plans

This article provides a comprehensive overview of Flink CDC 3.3, detailing its CDC fundamentals, new connectors, Transform module enhancements, asynchronous snapshot splitting, community adoption, and upcoming roadmap for broader ecosystem support and batch‑mode execution.

Big DataCDCChange Data Capture
0 likes · 15 min read
Understanding Flink CDC 3.3: Features, Improvements, and Future Plans
Big Data Technology Architecture
Big Data Technology Architecture
Mar 1, 2025 · Big Data

Core Principles and Practical Guide to Flink CDC

This article explains CDC fundamentals, details Flink CDC's architecture and advantages, provides setup steps, code examples for SQL and DataStream APIs, discusses performance tuning, consistency, common issues, and typical real‑time data integration scenarios.

CDCChange Data CaptureDebezium
0 likes · 7 min read
Core Principles and Practical Guide to Flink CDC
DataFunSummit
DataFunSummit
Feb 24, 2025 · Big Data

Building Real-Time Data Synchronization Pipelines with Apache SeaTunnel

Apache SeaTunnel is an open‑source, distributed data integration platform that enables efficient real‑time data synchronization across diverse sources and destinations, supporting both streaming and batch processing, with detailed architecture, connector plugins, CDC handling, transform capabilities, and deployment strategies for large‑scale data pipelines.

Apache SeatunnelCDCdata pipelines
0 likes · 34 min read
Building Real-Time Data Synchronization Pipelines with Apache SeaTunnel
macrozheng
macrozheng
Feb 24, 2025 · Databases

Mastering MySQL to Elasticsearch Sync: 4 Strategies & Top Migration Tools

This article explores four practical methods for synchronizing MySQL data to Elasticsearch—including synchronous and asynchronous double writes, SQL extraction, and binlog real‑time replication—while reviewing popular migration tools such as Canal, Alibaba DTS, and Databus to help you choose the right solution.

CDCCanalDTS
0 likes · 13 min read
Mastering MySQL to Elasticsearch Sync: 4 Strategies & Top Migration Tools
Tencent Advertising Technology
Tencent Advertising Technology
Dec 6, 2024 · Big Data

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Tencent's advertising team replaced a traditional HDFS‑Hive warehouse with an Apache Iceberg‑based data lake, adding primary‑key tables, multi‑stream merging, adaptive compaction, and Spark SPJ optimizations to achieve minute‑level feature update latency, 10× back‑fill speed, and up to 60% storage savings.

Big DataCDCCompaction
0 likes · 25 min read
Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent
IT Services Circle
IT Services Circle
Jun 12, 2024 · Databases

MySQL to Elasticsearch Data Synchronization: Strategies and Tool Selection

This article reviews four common MySQL‑to‑Elasticsearch synchronization methods—synchronous dual‑write, asynchronous dual‑write via MQ, timer‑based SQL extraction, and real‑time Binlog replication—evaluates their pros and cons, and compares popular migration tools such as Canal, Alibaba DTS, Databus and others.

CDCData Migration ToolsData Synchronization
0 likes · 11 min read
MySQL to Elasticsearch Data Synchronization: Strategies and Tool Selection
DataFunTalk
DataFunTalk
May 16, 2024 · Big Data

Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon

This article presents UCloud's USDP‑based streaming data lake warehouse solution that leverages Flink for real‑time processing and Paimon for lake storage, detailing its architecture, advantages, practical scenarios, and providing complete SQL and Flink CDC code snippets for end‑to‑end implementation.

Big DataCDCFlink
0 likes · 27 min read
Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Apr 9, 2024 · Big Data

Build Real-Time MySQL CDC Pipelines with Flink 1.19 and SpringBoot

This guide walks through setting up Flink CDC with MySQL on SpringBoot 2.7, covering binlog configuration, Maven dependencies, Java implementation for real‑time change capture, startup options, a custom Redis sink, and a web UI for monitoring the streaming pipeline.

CDCFlinkJava
0 likes · 10 min read
Build Real-Time MySQL CDC Pipelines with Flink 1.19 and SpringBoot
DataFunSummit
DataFunSummit
Mar 25, 2024 · Big Data

Exploring Real-Time Data Lake Practices at Kangaroo Cloud

This article shares Kangaroo Cloud's exploration and practice of a real-time data lake, covering background, data lake concepts, challenges, solution architecture using the Shuzhan platform with Iceberg/Hudi, CDC ingestion, small file handling, cross-cluster ingestion, materialized view acceleration, and future development plans.

CDCCross-Cluster IngestionHudi
0 likes · 12 min read
Exploring Real-Time Data Lake Practices at Kangaroo Cloud
DataFunSummit
DataFunSummit
Feb 20, 2024 · Big Data

BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook

This article introduces ByteDance's open‑source data integration engine BitSail, covering its background, layered architecture, recent feature enhancements, automated testing framework, CDC‑based full‑library synchronization solutions, and future development plans for connectors and real‑time data consistency.

Big DataCDCFlink
0 likes · 12 min read
BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook
Big Data Technology Architecture
Big Data Technology Architecture
Nov 28, 2023 · Big Data

Real-time Data Ingestion from MySQL to Apache Doris Using Flink CDC and Doris Flink Connector

This article demonstrates, with step‑by‑step examples, how to capture MySQL changes via Flink CDC and stream them in real time into Apache Doris using the Doris Flink Connector, covering CDC concepts, connector features, environment setup, SQL client usage, and data verification.

Apache DorisCDCConnector
0 likes · 13 min read
Real-time Data Ingestion from MySQL to Apache Doris Using Flink CDC and Doris Flink Connector
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 9, 2023 · Databases

Integrating Debezium for Change Data Capture in Spring Boot Applications

This article explains how to use Debezium's change data capture (CDC) capabilities to monitor MySQL binlog events, compares Canal and Debezium, outlines typical CDC use cases, and provides a complete Spring Boot integration guide with configuration, code examples, and testing procedures.

CDCChange Data CaptureDebezium
0 likes · 22 min read
Integrating Debezium for Change Data Capture in Spring Boot Applications
Code Ape Tech Column
Code Ape Tech Column
Aug 10, 2023 · Backend Development

Integrating Debezium for Change Data Capture in Spring Boot Applications

This article explains how to use CDC technology, particularly Debezium, to capture MySQL binlog changes and process them in a Spring Boot application without adding heavyweight middleware, providing code examples, configuration details, and typical use cases.

CDCChange Data CaptureDebezium
0 likes · 21 min read
Integrating Debezium for Change Data Capture in Spring Boot Applications
DataFunSummit
DataFunSummit
May 28, 2023 · Big Data

Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook

This article introduces Apache Hudi as a next‑generation streaming data‑lake platform, explains its core concepts, architecture, and table types, and showcases real‑world use cases at Tencent such as CDC ingestion, minute‑level real‑time warehousing, streaming analytics, multi‑stream joins, ad attribution, and stream‑to‑batch processing, while also outlining future directions.

Apache HudiBig DataCDC
0 likes · 16 min read
Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook
Selected Java Interview Questions
Selected Java Interview Questions
May 10, 2023 · Backend Development

Implementing Data Change Capture in SpringBoot Using Canal and RabbitMQ

This guide demonstrates how to decouple data change logging from business logic in a SpringBoot application by leveraging MySQL binlog monitoring with Canal, forwarding change events through RabbitMQ, and persisting both new and old record states using Docker‑compose, configuration files, and Java client code.

CDCCanalDataSync
0 likes · 18 min read
Implementing Data Change Capture in SpringBoot Using Canal and RabbitMQ
WeiLi Technology Team
WeiLi Technology Team
May 6, 2023 · Big Data

How We Upgraded Our Flink Cluster from 1.10 to 1.14.6 and Overcame Common Pitfalls

This article details the background of a Flink 1.10 cluster on Huawei Cloud, the technical challenges that prompted an upgrade, a step‑by‑step migration plan to Flink 1.14.6, troubleshooting of frequent errors, precautionary measures, and the performance and operational benefits achieved after the upgrade.

Big DataCDCFlink
0 likes · 19 min read
How We Upgraded Our Flink Cluster from 1.10 to 1.14.6 and Overcame Common Pitfalls
Selected Java Interview Questions
Selected Java Interview Questions
Feb 25, 2023 · Backend Development

Integrating SpringBoot with Canal and RabbitMQ for Database Change Capture

This guide explains how to decouple business logic in a SpringBoot application by using Canal to listen to MySQL binlog changes, forwarding those events through RabbitMQ, and processing them with a Java client to record both new and old data for insert, update, and delete operations.

CDCCanalDocker
0 likes · 22 min read
Integrating SpringBoot with Canal and RabbitMQ for Database Change Capture
Big Data Technology Architecture
Big Data Technology Architecture
Feb 24, 2023 · Big Data

Implementing Change Data Capture (CDC) on Data Lake Formats with Apache Hudi

This article reviews lake‑format concepts, Apache Hudi architecture, CDC fundamentals, design considerations for CDC on lake formats, implementation details of Hudi CDC, and streaming optimizations including automated lake‑table management and a simplified StreamingSQL for Spark.

Apache HudiBig DataCDC
0 likes · 19 min read
Implementing Change Data Capture (CDC) on Data Lake Formats with Apache Hudi
TAL Education Technology
TAL Education Technology
Feb 16, 2023 · Big Data

Step‑by‑Step Guide to Syncing Canal Data to Elasticsearch

This article provides a comprehensive, hands‑on tutorial for configuring Alibaba Canal and its client‑adapter to capture MySQL binlog changes and synchronize them into Elasticsearch, covering environment setup, Docker commands, YAML configuration files, index mapping, adapter startup, and common troubleshooting scenarios.

CDCCanalData Synchronization
0 likes · 26 min read
Step‑by‑Step Guide to Syncing Canal Data to Elasticsearch
Aikesheng Open Source Community
Aikesheng Open Source Community
Jan 18, 2023 · Databases

Real-Time Data Warehouse Evaluation: ClickHouse vs StarRocks and Synchronization Strategies

This article shares practical experience comparing ClickHouse and StarRocks as real‑time data warehouses, outlines the project requirements, evaluates each system's suitability for log‑type and business‑type data, and describes CDC‑based synchronization methods from MySQL to both platforms.

CDCClickHouseMySQL
0 likes · 8 min read
Real-Time Data Warehouse Evaluation: ClickHouse vs StarRocks and Synchronization Strategies