Tagged articles

CDC

125 articles · Page 1 of 2

Jun 21, 2026 · Artificial Intelligence

RAG Data Governance: Incremental Sync and Consistency (Part 1)

The article explains how additions, updates, and deletions affect a vector store differently, outlines three layers of incremental synchronization—change detection, change handling, and service stability—and compares timestamp polling, content‑hash diffing, and CDC while discussing consistency models and conflict resolution in distributed vector databases.

CDCData GovernanceRAG

0 likes · 16 min read

RAG Data Governance: Incremental Sync and Consistency (Part 1)

Alibaba Cloud Developer

Jun 18, 2026 · Big Data

How AI-Driven Real-Time Data Lakes Are Ditching ETL: A Kafka‑to‑Iceberg Architecture Simplification

In the AI era, enterprises need a data foundation that supports both low‑latency streaming and long‑term analytics, and the combination of Kafka, Iceberg and object storage is emerging as a preferred solution; by moving ingestion capabilities closer to the message layer and eliminating external ETL jobs, a "zero‑ETL" approach reduces architectural complexity, improves consistency, and streamlines schema evolution and small‑file management.

CDCData LakeIceberg

0 likes · 27 min read

How AI-Driven Real-Time Data Lakes Are Ditching ETL: A Kafka‑to‑Iceberg Architecture Simplification

StarRocks

May 8, 2026 · Big Data

Scaling Real‑Time Analytics at KaptureCX: Best Practices with RisingWave and StarRocks

KaptureCX migrated its core analytics from ClickHouse to StarRocks, introduced RisingWave and Kafka for CDC, and achieved millisecond‑level query latency, a reporting cycle cut from weeks to one day, and a solid data foundation for AI‑driven services.

CDCKafkaMVP

0 likes · 11 min read

Scaling Real‑Time Analytics at KaptureCX: Best Practices with RisingWave and StarRocks

ITPUB

Feb 13, 2026 · Big Data

Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC

This article explains how to extend a Flink CDC job that already syncs an entire MySQL database to Doris so that newly created tables are automatically created in Doris in real time, using the CdcTools utility, side‑output streams, and asynchronous I/O.

CDCCdcToolsDoris

0 likes · 9 min read

Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC

Aikesheng Open Source Community

Feb 3, 2026 · Databases

Why MySQL 9.6 Moves Foreign‑Key Enforcement to the SQL Engine

MySQL 9.6 shifts foreign‑key checks and cascade handling from the InnoDB storage engine to the SQL engine, eliminating hidden changes, improving binary‑log visibility, and delivering full‑log replication and analytics without sacrificing performance.

CDCDatabaseForeign Keys

0 likes · 9 min read

Why MySQL 9.6 Moves Foreign‑Key Enforcement to the SQL Engine

ITPUB

Jan 22, 2026 · Backend Development

Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools

This article explains how to use Flink CDC together with the CdcTools utility to automatically capture newly created MySQL tables and synchronize both their schema and data to a Doris database in real time, covering the required code, side‑output handling, async execution, and a special delete‑sign field.

Async IOCDCDoris

0 likes · 10 min read

Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools

ITPUB

Jan 18, 2026 · Databases

From Full Sync to Real‑Time CDC: Building Scalable Order Data Pipelines

An e‑commerce junior developer tackles the challenge of regularly syncing order data to a data warehouse, evolving from naïve full‑table copies to incremental sync, batch processing, cursor‑based pagination, performance tuning, and finally a real‑time CDC‑plus‑message‑queue architecture, while addressing reliability, ordering, and scaling issues.

BatchCDCCursor

0 likes · 13 min read

From Full Sync to Real‑Time CDC: Building Scalable Order Data Pipelines

Ctrip Technology

Nov 20, 2025 · Big Data

How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon

Ctrip transformed its traditional T+1 offline warehouse into a near‑real‑time lakehouse by integrating Flink CDC with Apache Paimon, designing a two‑stage CDC ingestion, optimizing performance, implementing dynamic updates, and deploying the solution across multiple business scenarios, achieving minute‑level latency, reduced costs, and faster data‑driven decisions.

CDCData EngineeringFlink

0 likes · 27 min read

How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon

dbaplus Community

Nov 12, 2025 · Databases

Mastering Data Sync: From Full Loads to Real‑Time CDC in E‑Commerce

This guide walks a new e‑commerce developer through the evolution of order data synchronization—from naïve full‑table loads, through incremental and batch strategies, cursor‑based pagination, performance tuning, and finally to real‑time CDC with message queues—highlighting pitfalls and practical solutions.

Batch ProcessingCDCCursor Pagination

0 likes · 12 min read

Mastering Data Sync: From Full Loads to Real‑Time CDC in E‑Commerce

Aikesheng Open Source Community

Sep 24, 2025 · Databases

How to Migrate SQL Server to OceanBase Using Action OMS: A Step‑by‑Step Guide

Action OMS, a customized version of OceanBase’s OMS tool, enables seamless migration and real‑time data subscription from Microsoft SQL Server (2008‑2019 Enterprise) to OceanBase, detailing preparation, configuration, CDC setup, and best‑practice considerations for reliable, low‑latency data transfer.

Action OMSCDCData Migration

0 likes · 15 min read

How to Migrate SQL Server to OceanBase Using Action OMS: A Step‑by‑Step Guide

StarRocks

Jul 1, 2025 · Big Data

How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics

Suixingfu rebuilt its payment data pipeline by replacing a fragmented Lambda stack with a unified Porter CDC + StarRocks + Elasticsearch architecture, achieving three‑fold query speed, ten‑fold analytics efficiency, 20% storage reduction, and sub‑second data‑capture latency across high‑concurrency, ad‑hoc, and batch workloads.

CDCData WarehouseFlink

0 likes · 14 min read

How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics

Big Data Technology Tribe

Jun 22, 2025 · Cloud Native

How to Ensure Consistent State in Event‑Driven Microservices: 3 Proven Patterns

This article explains the challenges of maintaining data consistency in distributed, event‑driven microservice architectures and introduces three practical patterns—Outbox, Original Event Handling, and Self‑Read—to guarantee reliable state synchronization across services, even when failures occur.

CDCEvent-Driven Architecturedistributed transactions

0 likes · 6 min read

How to Ensure Consistent State in Event‑Driven Microservices: 3 Proven Patterns

DataFunSummit

Apr 1, 2025 · Big Data

Understanding Flink CDC 3.3: Features, Improvements, and Future Plans

This article provides a comprehensive overview of Flink CDC 3.3, detailing its CDC fundamentals, new connectors, Transform module enhancements, asynchronous snapshot splitting, community adoption, and upcoming roadmap for broader ecosystem support and batch‑mode execution.

Big DataCDCChange Data Capture

0 likes · 15 min read

Understanding Flink CDC 3.3: Features, Improvements, and Future Plans

Alibaba Cloud Big Data AI Platform

Mar 18, 2025 · Big Data

Boosting Flink CDC to Hologres: High‑Performance Data Sync Optimization Techniques

This article presents a comprehensive overview of Flink CDC + Hologres high‑performance data synchronization, detailing write and consumption optimizations, architectural principles, and future directions to achieve low latency and high throughput in real‑time data pipelines.

CDCFlinkHologres

0 likes · 21 min read

Boosting Flink CDC to Hologres: High‑Performance Data Sync Optimization Techniques

Big Data Technology Architecture

Mar 1, 2025 · Big Data

Core Principles and Practical Guide to Flink CDC

This article explains CDC fundamentals, details Flink CDC's architecture and advantages, provides setup steps, code examples for SQL and DataStream APIs, discusses performance tuning, consistency, common issues, and typical real‑time data integration scenarios.

CDCChange Data CaptureDebezium

0 likes · 7 min read

Core Principles and Practical Guide to Flink CDC

DataFunSummit

Feb 24, 2025 · Big Data

Building Real-Time Data Synchronization Pipelines with Apache SeaTunnel

Apache SeaTunnel is an open‑source, distributed data integration platform that enables efficient real‑time data synchronization across diverse sources and destinations, supporting both streaming and batch processing, with detailed architecture, connector plugins, CDC handling, transform capabilities, and deployment strategies for large‑scale data pipelines.

Apache SeaTunnelCDCReal-Time Data Integration

0 likes · 34 min read

Building Real-Time Data Synchronization Pipelines with Apache SeaTunnel

macrozheng

Feb 24, 2025 · Databases

Mastering MySQL to Elasticsearch Sync: 4 Strategies & Top Migration Tools

This article explores four practical methods for synchronizing MySQL data to Elasticsearch—including synchronous and asynchronous double writes, SQL extraction, and binlog real‑time replication—while reviewing popular migration tools such as Canal, Alibaba DTS, and Databus to help you choose the right solution.

CDCCanalDTS

0 likes · 13 min read

Mastering MySQL to Elasticsearch Sync: 4 Strategies & Top Migration Tools

Alibaba Cloud Big Data AI Platform

Jan 27, 2025 · Big Data

Unlock Real-Time Data Sync with Flink CDC: YAML Integration, Transform & Route Explained

This article summarizes an advanced Flink CDC presentation, covering Flink CDC fundamentals, real‑time Flink integration, CDC‑YAML core capabilities, supported sync links, Transform and Route modules, monitoring metrics, schema‑change strategies, typical use cases, performance optimizations, demo implementations, and future development plans.

CDCData IntegrationFlink

0 likes · 20 min read

Unlock Real-Time Data Sync with Flink CDC: YAML Integration, Transform & Route Explained

Alibaba Cloud Big Data AI Platform

Jan 21, 2025 · Big Data

Master Flink CDC YAML: Real‑Time Data Integration Best Practices in 10 Minutes

This article introduces Flink CDC YAML, outlines its core capabilities and application scenarios, compares it with SQL and DataStream jobs, showcases enterprise‑grade features of Alibaba Cloud Flink CDC, and provides a step‑by‑step tutorial to build a complete CDC YAML job in just ten minutes.

CDCData IntegrationFlink

0 likes · 20 min read

Master Flink CDC YAML: Real‑Time Data Integration Best Practices in 10 Minutes

Tencent Advertising Technology

Dec 6, 2024 · Big Data

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Tencent's advertising team replaced a traditional HDFS‑Hive warehouse with an Apache Iceberg‑based data lake, adding primary‑key tables, multi‑stream merging, adaptive compaction, and Spark SPJ optimizations to achieve minute‑level feature update latency, 10× back‑fill speed, and up to 60% storage savings.

Big DataCDCCompaction

0 likes · 25 min read

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Su San Talks Tech

Jul 26, 2024 · Databases

Mastering MySQL‑to‑Elasticsearch Sync: 4 Strategies & Top Migration Tools

This guide compares four MySQL‑to‑Elasticsearch synchronization methods—synchronous dual‑write, asynchronous MQ‑based dual‑write, timer‑driven SQL extraction, and real‑time Binlog replication—and reviews popular CDC tools such as Canal, Alibaba Cloud DTS, Databus, and others to help you choose the right solution.

BinlogCDCCanal

0 likes · 13 min read

IT Services Circle

Jun 12, 2024 · Databases

MySQL to Elasticsearch Data Synchronization: Strategies and Tool Selection

This article reviews four common MySQL‑to‑Elasticsearch synchronization methods—synchronous dual‑write, asynchronous dual‑write via MQ, timer‑based SQL extraction, and real‑time Binlog replication—evaluates their pros and cons, and compares popular migration tools such as Canal, Alibaba DTS, Databus and others.

BinlogCDCData Migration Tools

0 likes · 11 min read

MySQL to Elasticsearch Data Synchronization: Strategies and Tool Selection

Su San Talks Tech

Jun 10, 2024 · Databases

Mastering MySQL‑to‑Elasticsearch Sync: 4 Strategies & Top Migration Tools

This article compares four MySQL‑to‑Elasticsearch synchronization methods—synchronous dual‑write, asynchronous dual‑write, SQL extraction, and Binlog‑based real‑time sync—evaluates their pros and cons, and reviews popular migration tools such as Canal, Alibaba DTS, Databus, Flink, CloudCanal, Maxwell, and DRDS.

BinlogCDCData synchronization

0 likes · 14 min read

DataFunTalk

May 16, 2024 · Big Data

Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon

This article presents UCloud's USDP‑based streaming data lake warehouse solution that leverages Flink for real‑time processing and Paimon for lake storage, detailing its architecture, advantages, practical scenarios, and providing complete SQL and Flink CDC code snippets for end‑to‑end implementation.

CDCData LakeFlink

0 likes · 27 min read

Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon

Spring Full-Stack Practical Cases

Apr 9, 2024 · Big Data

Build Real-Time MySQL CDC Pipelines with Flink 1.19 and SpringBoot

This guide walks through setting up Flink CDC with MySQL on SpringBoot 2.7, covering binlog configuration, Maven dependencies, Java implementation for real‑time change capture, startup options, a custom Redis sink, and a web UI for monitoring the streaming pipeline.

CDCFlinkMySQL

0 likes · 10 min read

Build Real-Time MySQL CDC Pipelines with Flink 1.19 and SpringBoot

DataFunSummit

Mar 25, 2024 · Big Data

Exploring Real-Time Data Lake Practices at Kangaroo Cloud

This article shares Kangaroo Cloud's exploration and practice of a real-time data lake, covering background, data lake concepts, challenges, solution architecture using the Shuzhan platform with Iceberg/Hudi, CDC ingestion, small file handling, cross-cluster ingestion, materialized view acceleration, and future development plans.

CDCCross-Cluster IngestionHudi

0 likes · 12 min read

Exploring Real-Time Data Lake Practices at Kangaroo Cloud

Big Data Technology & Architecture

Mar 9, 2024 · Big Data

Apache Paimon 0.7.0: Enhanced Lookup Join, CDC Capabilities, and Spark/Hive Integration

Apache Paimon 0.7.0 introduces significant improvements such as optimized lookup join handling, new CDC functionalities, and tighter Spark/Hive integration, while also highlighting practical considerations for using lake‑table lookups in production environments.

Apache PaimonBig DataCDC

0 likes · 5 min read

Apache Paimon 0.7.0: Enhanced Lookup Join, CDC Capabilities, and Spark/Hive Integration

DataFunSummit

Feb 20, 2024 · Big Data

BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook

This article introduces ByteDance's open‑source data integration engine BitSail, covering its background, layered architecture, recent feature enhancements, automated testing framework, CDC‑based full‑library synchronization solutions, and future development plans for connectors and real‑time data consistency.

Big DataCDCData Integration

0 likes · 12 min read

BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook

dbaplus Community

Dec 25, 2023 · Big Data

Why Spark and Flink Can't Stream MySQL via JDBC (And What Works Instead)

This article explains the limitations of using JDBC for true streaming reads in Spark and Flink, demonstrates failed attempts with MySQL, shows workarounds that revert to batch processing, and recommends Flink CDC as the practical solution for incremental MySQL ingestion.

Big DataCDCFlink

0 likes · 8 min read

Why Spark and Flink Can't Stream MySQL via JDBC (And What Works Instead)

ITPUB

Dec 24, 2023 · Backend Development

Why Kafka Is the Backbone of Modern Messaging, Streaming, and Data Pipelines

This article explains how Kafka serves as a high‑throughput, durable messaging system, a reliable storage layer, a log‑aggregation hub, a stream‑processing engine, and a core component for CDC, system migration, monitoring, and event‑sourcing architectures.

CDCEvent SourcingKafka

0 likes · 9 min read

Why Kafka Is the Backbone of Modern Messaging, Streaming, and Data Pipelines

Su San Talks Tech

Dec 3, 2023 · Big Data

Sync MySQL to Elasticsearch with Canal: Step‑by‑Step CDC Guide

This tutorial walks you through the fundamentals of MySQL binlog replication, installing and configuring Canal, setting up Elasticsearch, Kibana, and the IK analyzer, and then demonstrates both full and incremental data synchronization from MySQL to Elasticsearch.

Big DataCDCCanal

0 likes · 11 min read

Sync MySQL to Elasticsearch with Canal: Step‑by‑Step CDC Guide

Big Data Technology Architecture

Nov 28, 2023 · Big Data

Real-time Data Ingestion from MySQL to Apache Doris Using Flink CDC and Doris Flink Connector

This article demonstrates, with step‑by‑step examples, how to capture MySQL changes via Flink CDC and stream them in real time into Apache Doris using the Doris Flink Connector, covering CDC concepts, connector features, environment setup, SQL client usage, and data verification.

Apache DorisCDCConnector

0 likes · 13 min read

Real-time Data Ingestion from MySQL to Apache Doris Using Flink CDC and Doris Flink Connector

Big Data Technology & Architecture

Nov 28, 2023 · Big Data

Apache Paimon for CDC: Low‑Cost, Low‑Latency Data Lake Ingestion and Performance Comparison with Hive and Hudi

This article explains how Apache Paimon simplifies CDC data lake ingestion with one‑click, low‑cost, low‑latency pipelines, details its architecture and tag‑based Hive compatibility, provides best‑practice configurations, and presents benchmark results showing Paimon outperforming Hive and Hudi in both write and query performance.

Apache PaimonCDCData Lake

0 likes · 14 min read

Apache Paimon for CDC: Low‑Cost, Low‑Latency Data Lake Ingestion and Performance Comparison with Hive and Hudi

Alibaba Cloud Native

Nov 23, 2023 · Cloud Native

How CDC + Serverless Functions Enable Real‑Time ETL in Cloud Native Architectures

This article explains how Alibaba Cloud's Serverless Function Compute combined with Database Change Data Capture (CDC) creates a complete, real‑time ETL pipeline, detailing the ETL model, DTS integration, architecture components, event‑driven processing, and practical use cases such as OLTP‑to‑OLAP data flow.

Alibaba CloudCDCData Integration

0 likes · 10 min read

How CDC + Serverless Functions Enable Real‑Time ETL in Cloud Native Architectures

Rare Earth Juejin Tech Community

Nov 9, 2023 · Databases

Integrating Debezium for Change Data Capture in Spring Boot Applications

This article explains how to use Debezium's change data capture (CDC) capabilities to monitor MySQL binlog events, compares Canal and Debezium, outlines typical CDC use cases, and provides a complete Spring Boot integration guide with configuration, code examples, and testing procedures.

CDCChange Data CaptureDebezium

0 likes · 22 min read

Integrating Debezium for Change Data Capture in Spring Boot Applications

Java High-Performance Architecture

Sep 28, 2023 · Databases

How to Use Debezium for MySQL CDC in Spring Boot Without Adding Extra Middleware

Learn how to capture MySQL data changes using Debezium's CDC capabilities within a Spring Boot application, avoiding heavyweight message brokers by leveraging binlog monitoring, configuring connectors, handling snapshots, and processing change events for use cases like cache invalidation, data integration, and simplifying monolithic architectures.

CDCData IntegrationDebezium

0 likes · 24 min read

How to Use Debezium for MySQL CDC in Spring Boot Without Adding Extra Middleware

dbaplus Community

Sep 24, 2023 · Backend Development

How to Sync MySQL Binlog to Elasticsearch Using Canal and RocketMQ

Learn step‑by‑step how to configure Alibaba’s open‑source Canal to capture MySQL binlog changes, route them through RocketMQ, and index the data into Elasticsearch, covering cluster mode, MySQL and Elasticsearch setup, Canal properties, and consumer implementation details.

CDCCanalMySQL

0 likes · 9 min read

How to Sync MySQL Binlog to Elasticsearch Using Canal and RocketMQ

Java Backend Technology

Aug 19, 2023 · Big Data

Top ETL Tools Compared: Kettle, DataX, DataPipeline, Talend, DataStage, Sqoop, FineDataLink, Canal

This guide reviews the most popular ETL and data integration tools—including Kettle, DataX, DataPipeline, Talend, DataStage, Sqoop, FineDataLink, and Canal—detailing their core features, architectures, and typical use cases to help you choose the right solution for data migration and synchronization.

Big DataCDCData Integration

0 likes · 13 min read

Top ETL Tools Compared: Kettle, DataX, DataPipeline, Talend, DataStage, Sqoop, FineDataLink, Canal

Java Interview Crash Guide

Aug 14, 2023 · Big Data

Unlocking Change Data Capture with Debezium in Spring Boot – No Extra Middleware Needed

This article explains how small web projects can avoid heavyweight message middleware by using CDC technology, specifically Debezium, to monitor MySQL binlog changes, outlines why Debezium outperforms alternatives like Canal, and provides step‑by‑step Spring Boot integration with configuration, code samples, and practical use‑case scenarios.

CDCChange Data CaptureDebezium

0 likes · 22 min read

Unlocking Change Data Capture with Debezium in Spring Boot – No Extra Middleware Needed

Code Ape Tech Column

Aug 10, 2023 · Backend Development

Integrating Debezium for Change Data Capture in Spring Boot Applications

This article explains how to use CDC technology, particularly Debezium, to capture MySQL binlog changes and process them in a Spring Boot application without adding heavyweight middleware, providing code examples, configuration details, and typical use cases.

CDCChange Data CaptureDebezium

0 likes · 21 min read

Big Data Technology & Architecture

Jul 4, 2023 · Big Data

Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics

This article presents a step‑by‑step guide on how the logistics provider Haicheng Bangda implemented a streaming data warehouse using Paimon, Flink CDC, and Kubernetes, covering business background, architecture choices, environment setup, SQL examples, troubleshooting tips, and future roadmap for their digital transformation.

Big DataCDCData Warehouse

0 likes · 27 min read

Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics

DataFunSummit

May 28, 2023 · Big Data

Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook

This article introduces Apache Hudi as a next‑generation streaming data‑lake platform, explains its core concepts, architecture, and table types, and showcases real‑world use cases at Tencent such as CDC ingestion, minute‑level real‑time warehousing, streaming analytics, multi‑stream joins, ad attribution, and stream‑to‑batch processing, while also outlining future directions.

Apache HudiCDCData Lake

0 likes · 16 min read

Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook

Selected Java Interview Questions

May 10, 2023 · Backend Development

Implementing Data Change Capture in SpringBoot Using Canal and RabbitMQ

This guide demonstrates how to decouple data change logging from business logic in a SpringBoot application by leveraging MySQL binlog monitoring with Canal, forwarding change events through RabbitMQ, and persisting both new and old record states using Docker‑compose, configuration files, and Java client code.

CDCCanalDataSync

0 likes · 18 min read

Implementing Data Change Capture in SpringBoot Using Canal and RabbitMQ

WeiLi Technology Team

May 6, 2023 · Big Data

How We Upgraded Our Flink Cluster from 1.10 to 1.14.6 and Overcame Common Pitfalls

This article details the background of a Flink 1.10 cluster on Huawei Cloud, the technical challenges that prompted an upgrade, a step‑by‑step migration plan to Flink 1.14.6, troubleshooting of frequent errors, precautionary measures, and the performance and operational benefits achieved after the upgrade.

Big DataCDCFlink

0 likes · 19 min read

How We Upgraded Our Flink Cluster from 1.10 to 1.14.6 and Overcame Common Pitfalls

ITPUB

Apr 26, 2023 · Databases

Mastering Change Data Capture: Open‑Source Tools and How to Choose the Right One

This article explains the concept of Change Data Capture (CDC), outlines its common use cases, compares the main technical approaches—including timestamps, data diff, triggers, and log‑based methods—and reviews popular open‑source CDC solutions and their database‑specific configuration requirements.

CDCChange Data CaptureData Integration

0 likes · 15 min read

Mastering Change Data Capture: Open‑Source Tools and How to Choose the Right One

Big Data Technology & Architecture

Feb 28, 2023 · Big Data

Comprehensive Guide to Dual‑Stream Join in Flink CDC with Java DataStream API

This article provides a detailed tutorial on implementing various dual‑stream join techniques—including processing‑time, event‑time, and interval joins—using Flink CDC 2.2 and Flink 1.14 with the Java DataStream API, complete with code examples, SQL setup, and execution results.

Big DataCDCDataStream

0 likes · 31 min read

Comprehensive Guide to Dual‑Stream Join in Flink CDC with Java DataStream API

Selected Java Interview Questions

Feb 25, 2023 · Backend Development

Integrating SpringBoot with Canal and RabbitMQ for Database Change Capture

This guide explains how to decouple business logic in a SpringBoot application by using Canal to listen to MySQL binlog changes, forwarding those events through RabbitMQ, and processing them with a Java client to record both new and old data for insert, update, and delete operations.

CDCCanalDocker

0 likes · 22 min read

Integrating SpringBoot with Canal and RabbitMQ for Database Change Capture

Big Data Technology Architecture

Feb 24, 2023 · Big Data

Implementing Change Data Capture (CDC) on Data Lake Formats with Apache Hudi

This article reviews lake‑format concepts, Apache Hudi architecture, CDC fundamentals, design considerations for CDC on lake formats, implementation details of Hudi CDC, and streaming optimizations including automated lake‑table management and a simplified StreamingSQL for Spark.

Apache HudiCDCDelta Lake

0 likes · 19 min read

Implementing Change Data Capture (CDC) on Data Lake Formats with Apache Hudi

TAL Education Technology

Feb 16, 2023 · Big Data

Step‑by‑Step Guide to Syncing Canal Data to Elasticsearch

This article provides a comprehensive, hands‑on tutorial for configuring Alibaba Canal and its client‑adapter to capture MySQL binlog changes and synchronize them into Elasticsearch, covering environment setup, Docker commands, YAML configuration files, index mapping, adapter startup, and common troubleshooting scenarios.

CDCCanalConfiguration

0 likes · 26 min read

Step‑by‑Step Guide to Syncing Canal Data to Elasticsearch

Big Data Technology & Architecture

Jan 29, 2023 · Big Data

Understanding Retract Streams in Apache Flink: Aggregation and Sink Operators

This article explains the concept of retract streams in Apache Flink, detailing how non‑retract Kafka sources and Group‑By aggregations generate delete/insert messages, provides code examples for aggregation and sink operators, and compares retract mechanisms across aggregation and CDC sink scenarios.

AggregationCDCFlink

0 likes · 15 min read

Understanding Retract Streams in Apache Flink: Aggregation and Sink Operators

Aikesheng Open Source Community

Jan 18, 2023 · Databases

Real-Time Data Warehouse Evaluation: ClickHouse vs StarRocks and Synchronization Strategies

This article shares practical experience comparing ClickHouse and StarRocks as real‑time data warehouses, outlines the project requirements, evaluates each system's suitability for log‑type and business‑type data, and describes CDC‑based synchronization methods from MySQL to both platforms.

CDCClickHouseMySQL

0 likes · 8 min read

Real-Time Data Warehouse Evaluation: ClickHouse vs StarRocks and Synchronization Strategies

Big Data Technology & Architecture

Dec 28, 2022 · Big Data

Flink 1.16 Highlights: Adaptive Batch Scheduling, Speculative Execution, Hybrid Shuffle, Dynamic Partition Pruning, Hive SQL Migration, Checkpoint Enhancements, CDC Integration, and Table Store

Flink 1.16 introduces adaptive batch scheduling, speculative execution, hybrid shuffle, dynamic partition pruning, improved Hive SQL compatibility, advanced checkpoint mechanisms including changelog backend, and integrates CDC with Kafka and Table Store, offering faster, more stable, and easier-to-use stream‑batch processing capabilities.

Big DataCDCCheckpoint

0 likes · 8 min read

Flink 1.16 Highlights: Adaptive Batch Scheduling, Speculative Execution, Hybrid Shuffle, Dynamic Partition Pruning, Hive SQL Migration, Checkpoint Enhancements, CDC Integration, and Table Store

ITPUB

Dec 18, 2022 · Big Data

How to Build a Real‑Time Data Warehouse with EasyData: A Step‑by‑Step Guide

Learn how to design and implement a real‑time data warehouse for an app’s AB‑test monitoring using EasyData, covering data flow layers, CDC task creation, stream table registration, Flink SQL processing, and BI reporting, with detailed steps, code snippets, and practical tips.

CDCEasyDataFlink

0 likes · 13 min read

How to Build a Real‑Time Data Warehouse with EasyData: A Step‑by‑Step Guide

DataFunSummit

Dec 2, 2022 · Big Data

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

BitSail, ByteDance’s open‑source data integration engine, unifies batch, streaming, and incremental data synchronization across heterogeneous sources, detailing its evolution from early Flink‑based prototypes to a mature, plugin‑driven architecture with multi‑engine support, low‑cost co‑development, and robust CDC lakehouse capabilities.

Big DataCDCFlink

0 likes · 19 min read

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

Past Memory Big Data

Nov 26, 2022 · Big Data

Is Apache Flink Truly Powerful Enough After Hundreds of Engineers and Multiple Double‑11 Deployments?

The interview with Alibaba researcher Wang Feng reviews Flink's eight‑year journey to a top Apache project, its massive scale at Double 11, the push toward unified stream‑batch computing, emerging storage challenges, and the roadmap for cloud‑native, real‑time data warehousing.

Apache FlinkBatch ProcessingCDC

0 likes · 16 min read

Is Apache Flink Truly Powerful Enough After Hundreds of Engineers and Multiple Double‑11 Deployments?

DataFunTalk

Nov 6, 2022 · Big Data

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

BitSail, an open‑source data integration engine from ByteDance, provides a unified solution for batch, streaming, full‑load, and incremental data synchronization across heterogeneous sources, detailing its background, technical evolution, architecture, low‑cost co‑building features, compatibility strategies, and future roadmap.

CDCData IntegrationFlink

0 likes · 18 min read

IT Services Circle

Oct 26, 2022 · Databases

Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide

This article introduces Debezium, an open‑source low‑latency change data capture platform that streams database row changes via Kafka, explains its architecture and common scenarios such as cache invalidation and CQRS, and provides step‑by‑step Docker commands to install ZooKeeper, Kafka, MySQL and the Debezium connector.

CDCData IntegrationDebezium

0 likes · 15 min read

Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide

DataFunSummit

Oct 21, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform architecture and three real‑time lake initiatives—log ingestion, CDC ingestion, and lake analysis—showcasing how Apache Iceberg, Flink, and custom shuffling algorithms solve small‑file and cross‑cloud challenges while enabling schema evolution and future multi‑cloud optimizations.

Apache IcebergBig DataCDC

0 likes · 16 min read

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

Big Data Technology & Architecture

Oct 8, 2022 · Big Data

Flink CDC Tutorial: Sync MySQL Data to Hudi Data Lake Using SQL

This article provides a comprehensive guide on using Flink CDC with Debezium to capture MySQL changes, covering serialization, adding dependencies, configuring SQL client and Java/Scala APIs, creating source and sink tables, enabling checkpoints, and streaming data into a Hudi data lake.

CDCDataLakeFlink

0 likes · 10 min read

Flink CDC Tutorial: Sync MySQL Data to Hudi Data Lake Using SQL

Alibaba Cloud Native

Sep 29, 2022 · Cloud Native

Why Use RocketMQ Connect for Scalable Data Pipelines?

This article explains the challenges of point‑to‑point data sync, introduces RocketMQ Connect as a cloud‑native solution that decouples upstream and downstream, details its architecture, connectors, REST API, metrics, deployment modes, and provides a step‑by‑step guide to building custom connectors for use cases such as CDC, data lakes, and system migration.

CDCCloud NativeConnector

0 likes · 19 min read

Why Use RocketMQ Connect for Scalable Data Pipelines?

DataFunTalk

Aug 6, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform engineering, describing how Apache Iceberg is leveraged for real‑time data lake ingestion, CDC pipelines, multi‑cloud storage, small‑file mitigation, schema evolution, and future plans across storage, compute, and management within a big‑data ecosystem.

Apache IcebergCDCFlink

0 likes · 16 min read

Efficient Ops

Jul 19, 2022 · Databases

How CDC Powers Real-Time Analytics Without Overloading Your Database

This article introduces the practice of Change Data Capture (CDC), explaining how capturing only data changes can feed downstream systems and data warehouses in near real‑time, reducing load on the source database, improving reporting latency, and supporting scalable, reliable analytics pipelines.

CDCChange Data CaptureData Replication

0 likes · 9 min read

How CDC Powers Real-Time Analytics Without Overloading Your Database

Alibaba Cloud Native

Jul 17, 2022 · Cloud Native

Build Real-Time CDC Pipelines on Alibaba Cloud EventBridge with DTS

This article explains Change Data Capture (CDC) concepts, compares open‑source CDC tools, and shows how to leverage Alibaba Cloud EventBridge and DTS to build real‑time CDC pipelines, covering setup steps, event‑bus vs event‑stream choices, best‑practice scenarios such as CQRS, microservice decoupling, database backup, and SQL auditing.

CDCCloud NativeDTS

0 likes · 12 min read

Build Real-Time CDC Pipelines on Alibaba Cloud EventBridge with DTS

Efficient Ops

Jul 6, 2022 · Databases

How DataBus Enables Real-Time, Scalable Database Synchronization for Oracle Migration

DataBus is a real‑time data synchronization framework designed to support Oracle de‑commissioning, micro‑service migration, and heterogeneous storage engines by providing high‑availability CDC, flexible data pipelines, and seamless full‑to‑incremental migration across multiple source and target databases.

CDCData synchronizationHigh Availability

0 likes · 19 min read

How DataBus Enables Real-Time, Scalable Database Synchronization for Oracle Migration

Alibaba Cloud Developer

Jun 17, 2022 · Big Data

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

This article introduces Delta Lake as an open‑source lakehouse storage framework, explains its core features, file and metadata structures, details Alibaba Cloud EMR's enhancements and deep integration with DLF, and presents G‑SCD and CDC solutions for real‑time incremental data warehousing.

CDCDLFDelta Lake

0 likes · 11 min read

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

Bilibili Tech

Jun 10, 2022 · Big Data

Incremental Data Lake Design and Hudi Core Optimizations with Flink

The article describes how combining Apache Flink with Hudi enables an incremental data lake that delivers near‑real‑time analytics by switching to merge‑on‑read, fixing log handling bugs, improving compaction planning, and refactoring table‑service scheduling, while showcasing use cases such as CDC ingestion, data quality control, and real‑time materialized views, and outlines future enhancements like optimistic concurrency and unified schema evolution.

Apache HudiCDCCompaction Optimization

0 likes · 21 min read

Incremental Data Lake Design and Hudi Core Optimizations with Flink

IT Architects Alliance

Jun 7, 2022 · Databases

Introduction to Change Data Capture (CDC) Practices

This article introduces the concept and practice of Change Data Capture (CDC), explaining how it captures database changes to provide real‑time incremental data for analytics and reporting without impacting source performance, and outlines modern CDC methods, challenges, and production‑ready system requirements.

CDCChange Data CaptureData Integration

0 likes · 8 min read

Introduction to Change Data Capture (CDC) Practices

Top Architect

Jun 7, 2022 · Databases

An Introduction to Change Data Capture (CDC) Practices and Modern Approaches

This article introduces the concept of Change Data Capture (CDC), explains why traditional batch reporting strains resources, describes how CDC captures only data changes to keep source databases performant, and outlines modern CDC architectures, production‑ready considerations, and best‑practice guidelines for building reliable data pipelines.

CDCChange Data CaptureData Integration

0 likes · 16 min read

An Introduction to Change Data Capture (CDC) Practices and Modern Approaches

DataFunTalk

May 24, 2022 · Big Data

Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake

This article explains how Apache Flink integrates with Apache Hudi to enable real‑time data lake ingestion, covering the evolution from traditional data warehouses to data lakes, Hudi’s core concepts such as timeline and file grouping, copy‑on‑write vs merge‑on‑read modes, and Flink’s CDC‑based ETL pipeline.

Big DataCDCData Lake

0 likes · 18 min read

Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake

Big Data Technology Architecture

May 22, 2022 · Big Data

Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions

This article introduces Delta Lake as an open‑source storage layer for lake‑house architectures, explains its key features, file and metadata structures, and details how Alibaba Cloud EMR and Data Lake Formation integrate and extend Delta Lake with advanced capabilities such as G‑SCD, CDC, performance optimizations, and future roadmap.

CDCDLFDelta Lake

0 likes · 10 min read

Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions

Alibaba Cloud Developer

May 13, 2022 · Big Data

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

Delta Lake, an open‑source storage layer from Databricks, provides ACID transactions, data versioning, schema evolution, and unified batch‑stream processing, with a detailed file structure and metadata mechanism, while Alibaba Cloud EMR enhances it with advanced DML, performance optimizations, deep DLF integration, and solutions for G‑SCD and CDC.

CDCDLFData Lakehouse

0 likes · 11 min read

IT Architects Alliance

May 11, 2022 · Databases

How Change Data Capture Enables Real‑Time Analytics Without Overloading Your Database

The article explains the fundamentals of Change Data Capture (CDC), describing how capturing DML changes from relational databases like MySQL or PostgreSQL can provide incremental, near‑real‑time data for analytics and reporting while preserving source performance, and outlines modern CDC architectures, transaction‑log based extraction, and production‑ready design considerations.

CDCChange Data CaptureIncremental Loading

0 likes · 9 min read

How Change Data Capture Enables Real‑Time Analytics Without Overloading Your Database

Top Architect

May 11, 2022 · Databases

An Introduction to Change Data Capture (CDC) Practices

This article introduces the concept and practice of Change Data Capture (CDC), explaining why CDC is needed for real‑time analytics, how it works by capturing DML changes, modern approaches using transaction logs, and key considerations for building a production‑ready CDC system.

CDCChange Data CaptureData Integration

0 likes · 8 min read

An Introduction to Change Data Capture (CDC) Practices

Aikesheng Open Source Community

May 9, 2022 · Databases

TiDB Cluster Splitting: Full Backup, Binlog Incremental Sync, and Migration Strategy

This article details a comprehensive TiDB cluster splitting project, covering background, challenges, backup and restore tools, multi‑stage migration steps, binlog incremental synchronization, CDC integration, and practical tips to ensure data consistency and minimal service impact.

BR ToolCDCCluster Splitting

0 likes · 13 min read

TiDB Cluster Splitting: Full Backup, Binlog Incremental Sync, and Migration Strategy

Shopee Tech Team

Mar 17, 2022 · Backend Development

Real-time Checking System for Data Consistency in Microservices

Shopee’s Real‑time Checking System provides configurable, non‑intrusive data consistency verification for micro‑services by capturing change events via CDC, streaming them through Kafka, applying flexible rules and expressions, and instantly alerting mismatches, delivering second‑level detection while scaling to tens of thousands of checks per second.

CDCData ConsistencyRedis

0 likes · 20 min read

Real-time Checking System for Data Consistency in Microservices

Big Data Technology & Architecture

Mar 15, 2022 · Big Data

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

This article introduces Change Data Capture (CDC), compares query‑based and log‑based CDC, explains Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including full Java code—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCClickHouse

0 likes · 17 min read

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

Volcano Engine Developer Services

Feb 16, 2022 · Big Data

ByteDance’s Journey to a Unified Data Lake with Flink and Hudi

This article recounts ByteDance’s evolution from batch‑only Flink pipelines to a unified data‑lake integration platform, detailing the three integration modes, challenges with Spark‑based CDC, the decision to adopt Hudi over Iceberg, and how Hudi’s indexing and Merge‑On‑Read formats enable near‑real‑time analytics at massive scale.

CDCFlinkHudi

0 likes · 10 min read

ByteDance’s Journey to a Unified Data Lake with Flink and Hudi

Big Data Technology & Architecture

Feb 16, 2022 · Big Data

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

This article introduces Change Data Capture (CDC), compares query‑based and log‑based approaches, explains Debezium and ClickHouse, and provides detailed Flink CDC and Flink SQL CDC examples—including Java source code, custom deserialization schema, ClickHouse sink implementation, and required Maven dependencies—to synchronize MySQL data into ClickHouse in real time.

Big DataCDCClickHouse

0 likes · 17 min read

DataFunTalk

Jan 13, 2022 · Big Data

Advanced Features of the Pravega Flink Connector Table API: Schema Registry, Catalog Integration, and Debezium Support

This article summarizes the Pravega Schema Registry project, its integration with Flink's Catalog API, the addition of Debezium CDC support, and the related implementation challenges, providing detailed DDL examples, code snippets, and architectural diagrams for building real‑time data pipelines.

CDCCatalog APIDebezium

0 likes · 15 min read

Advanced Features of the Pravega Flink Connector Table API: Schema Registry, Catalog Integration, and Debezium Support

Big Data Technology & Architecture

Dec 22, 2021 · Big Data

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

This article explains Change Data Capture (CDC), compares query‑based and log‑based approaches, introduces Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including Java source, deserialization, sink code and required Maven dependencies—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCClickHouse

0 likes · 14 min read

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

DataFunSummit

Dec 4, 2021 · Big Data

Building a Real-Time Data Warehouse with Flink: Hive Integration, Upsert‑Kafka, and CDC Connectors

This tutorial explains how to use Apache Flink 1.12 to construct a unified streaming‑batch data warehouse by integrating Hive via HiveCatalog and HiveDialect, performing read/write operations, configuring upsert‑Kafka sinks, and leveraging Flink CDC connectors for change data capture from MySQL and other sources.

CDCFlinkHive

0 likes · 46 min read

Building a Real-Time Data Warehouse with Flink: Hive Integration, Upsert‑Kafka, and CDC Connectors

Big Data Technology Architecture

Nov 30, 2021 · Big Data

Building a Real-Time MySQL and PostgreSQL Streaming ETL with Flink CDC

This tutorial shows how to quickly construct a streaming ETL pipeline that captures changes from MySQL and PostgreSQL using Flink CDC, enriches order data with product and shipment information, and writes the results into Elasticsearch for real‑time visualization in Kibana.

CDCDockerElasticsearch

0 likes · 11 min read

Building a Real-Time MySQL and PostgreSQL Streaming ETL with Flink CDC

Big Data Technology Architecture

Nov 23, 2021 · Big Data

Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster

This comprehensive tutorial walks through configuring a Hadoop‑based environment (Flink 1.13.1, Scala 2.11, CDH 6.2.0, Hive 2.1.1, Hudi 0.10), compiling Hudi, setting up Flink and MySQL binlog, creating CDC source and Hudi sink tables, running Flink jobs, and synchronizing the results to Hive partitions for query via Hive and Presto.

CDCFlinkHive

0 likes · 15 min read

Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster

HomeTech

Nov 17, 2021 · Big Data

Lakehouse Architecture Practice with Flink and Iceberg: Real‑time Data Ingestion and Management

This article details a lakehouse architecture built on Flink and Iceberg that addresses Hive‑based warehouse limitations by enabling ACID transactions, incremental snapshots, stream‑batch unification, CDC support, and various operational optimizations, ultimately achieving near real‑time data ingestion and analytics.

CDCFlinkIceberg

0 likes · 10 min read

Lakehouse Architecture Practice with Flink and Iceberg: Real‑time Data Ingestion and Management

Big Data Technology & Architecture

Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration

0 likes · 29 min read

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

HomeTech

Nov 3, 2021 · Big Data

Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation

This article presents Car Home's experience building a real‑time materialized view system on Apache Flink, detailing system analysis, problem decomposition, a global‑version‑based CDC algorithm, its implementation as a Flink connector, practical deployment results, and remaining challenges such as clock dependency and state size.

CDCFlinkalgorithm

0 likes · 17 min read

Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation

Big Data Technology & Architecture

Oct 21, 2021 · Big Data

Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal

This article provides a detailed comparison of three popular open‑source change data capture tools—Debezium, Flink CDC, and Canal—covering their underlying principles, architecture, deployment options, performance characteristics, and suitability for real‑time data synchronization in big‑data environments.

CDCCanalChange Data Capture

0 likes · 21 min read

Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal

TAL Education Technology

Oct 14, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Chunk Splitting, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its core design—including chunk splitting for full‑load and incremental reads—provides a Flink SQL example, and walks through the MySQL CDC source implementation with detailed code snippets and processing logic.

CDCChunkSplittingDebezium

0 likes · 32 min read

Understanding Flink CDC 2.0: Core Design, Chunk Splitting, and Code Walkthrough

Alibaba Cloud Developer

Sep 9, 2021 · Big Data

How Apache Hudi & Pulsar Enable Real‑Time CDC Data Lake Ingestion

This article explains CDC fundamentals, compares query‑based and log‑based capture, describes typical CDC‑to‑lake architectures using Pulsar and Hudi, dives into Hudi's core design, optimization techniques, and future roadmap, and provides practical insights for building scalable data lakes.

Apache HudiCDCPulsar

0 likes · 17 min read

How Apache Hudi & Pulsar Enable Real‑Time CDC Data Lake Ingestion

Alibaba Cloud Developer

Sep 2, 2021 · Databases

From LAMP to Cloud‑Native: Evolving Application Data Architecture and Best Practices

This article traces two decades of application data architecture evolution, comparing traditional single‑system LAMP designs with modern multi‑component cloud‑native stacks, and offers practical guidance on scaling, component selection, CDC‑based data derivation, and cloud‑native implementations such as Tablestore.

CDCData ArchitectureDatabases

0 likes · 22 min read

From LAMP to Cloud‑Native: Evolving Application Data Architecture and Best Practices

Big Data Technology Architecture

Aug 31, 2021 · Big Data

Real-time CDC Data Read/Write Solutions in Data Lake Architecture with Flink and Iceberg

This article, compiled by community volunteers, examines various CDC data real‑time read/write solutions for data lake architectures, comparing offline HBase, Apache Kudu, Hive, Spark + Delta, and ultimately advocating Flink + Iceberg for efficient, correct, and scalable streaming ingestion and analytics.

CDCFlinkIceberg

0 likes · 18 min read

Real-time CDC Data Read/Write Solutions in Data Lake Architecture with Flink and Iceberg

Big Data Technology & Architecture

Jul 20, 2021 · Big Data

Common Issues and Solutions for Flink CDC with MySQL

This article summarizes frequent problems encountered when using Flink CDC with MySQL—including Kafka version conflicts, checkpoint timeouts, permission errors, global lock issues, and DDL parsing failures—and provides practical configuration tweaks and code examples to resolve them.

CDCCheckpointDebezium

0 likes · 11 min read

Common Issues and Solutions for Flink CDC with MySQL

DataFunTalk

Jul 10, 2021 · Big Data

Building a Lakehouse Architecture with Apache Iceberg and Flink: Practices and Insights

This article explains how to construct a lake‑house architecture using Apache Iceberg, detailing the migration from Hive, Flink‑SQL integration, proxy user support, CDC handling, copy‑on‑write sinks, and the resulting benefits for near‑real‑time data visibility and unified batch‑stream processing.

Apache IcebergCDCFlink

0 likes · 10 min read

Building a Lakehouse Architecture with Apache Iceberg and Flink: Practices and Insights

Big Data Technology & Architecture

Jul 8, 2021 · Big Data

Using Flink CDC to Write Data into Apache Hudi and Query with Hive and Spark SQL

This guide walks through preparing the environment, creating a MySQL source table, configuring Flink CDC to ingest data into an Apache Hudi table, and then querying the Hudi data using both Hive and Spark‑SQL, including handling of partitions, realtime input formats, and required configuration settings.

CDCDataPipelineFlink

0 likes · 10 min read

Using Flink CDC to Write Data into Apache Hudi and Query with Hive and Spark SQL

Qunhe Technology Quality Tech

Jul 2, 2021 · Databases

How to Seamlessly Migrate Billions of Rows: A Practical Guide to Database Sharding and Sync

This article outlines a comprehensive, step‑by‑step approach for migrating massive tables—up to billions of rows—through sharding, dual‑write, data synchronization, validation, testing, and controlled switch‑over to ensure minimal impact on production services.

CDCData synchronizationDual Write

0 likes · 16 min read

How to Seamlessly Migrate Billions of Rows: A Practical Guide to Database Sharding and Sync

Programmer DD

Jun 14, 2021 · Databases

Master Real‑Time Change Data Capture with Debezium and Spring Boot

Learn how to capture and stream real‑time database changes using Debezium’s distributed CDC framework, configure MySQL binlog, integrate the embedded engine with Spring Boot, and process change events with sample code and Docker setup for robust data pipelines.

CDCChange Data CaptureDebezium

0 likes · 11 min read

Master Real‑Time Change Data Capture with Debezium and Spring Boot

Full-Stack Internet Architecture

May 19, 2021 · Backend Development

Understanding Message Queues: Benefits, Design Challenges, and Transactional Solutions

This article explores the role of message queues in microservice architectures, discussing their advantages such as decoupling, asynchronous processing, and load shedding, while also addressing design challenges like concurrency, ordering, duplicate handling, and transactional messaging with solutions including Kafka partitions, outbox patterns, CDC, and RocketMQ.

CDCKafkaMessage Queue

0 likes · 12 min read

Understanding Message Queues: Benefits, Design Challenges, and Transactional Solutions

Top Architect

May 4, 2021 · Big Data

Overview of CDC Tools: Canal, Maxwell, Databus, and Alibaba DTS

This article introduces four change‑data‑capture solutions—Canal, Maxwell, Databus, and Alibaba Data Transmission Service (DTS)—explaining their principles, processing steps, features, and practical advantages for real‑time data synchronization and migration in big‑data environments.

Alibaba DTSBig DataCDC

0 likes · 6 min read

Overview of CDC Tools: Canal, Maxwell, Databus, and Alibaba DTS

DataFunTalk

Apr 27, 2021 · Big Data

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

This article describes how Linkflow migrated mutable customer data from MySQL to an Apache Hudi data lake using Debezium‑in‑Flink CDC, addressing challenges such as snapshot resumability, partial updates, row‑key merging, schema evolution, indexing, and concurrent writes to achieve minute‑level data freshness and improved offline processing performance.

Apache HudiBig DataCDC

0 likes · 21 min read

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

dbaplus Community

Apr 18, 2021 · Databases

Mastering Oracle‑to‑MySQL Migration: Key Differences, Risks, and Proven Strategies

This comprehensive guide explains how to migrate Oracle databases to MySQL by detailing type differences, migration steps, performance tuning, and validation techniques, while highlighting common pitfalls such as data‑type mismatches, character‑set issues, LOB handling, and transaction isolation nuances.

CDCData TypesMySQL

0 likes · 30 min read

Mastering Oracle‑to‑MySQL Migration: Key Differences, Risks, and Proven Strategies