Tagged articles
122 articles
Page 1 of 2
ITPUB
ITPUB
Feb 13, 2026 · Big Data

Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC

This article explains how to extend a Flink CDC job that already syncs an entire MySQL database to Doris so that newly created tables are automatically created in Doris in real time, using the CdcTools utility, side‑output streams, and asynchronous I/O.

CDCCdcToolsFlink
0 likes · 9 min read
Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC
ITPUB
ITPUB
Jan 22, 2026 · Backend Development

Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools

This article explains how to use Flink CDC together with the CdcTools utility to automatically capture newly created MySQL tables and synchronize both their schema and data to a Doris database in real time, covering the required code, side‑output handling, async execution, and a special delete‑sign field.

Async IOCDCFlink
0 likes · 10 min read
Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools
ITPUB
ITPUB
Jan 18, 2026 · Databases

From Full Sync to Real‑Time CDC: Building Scalable Order Data Pipelines

An e‑commerce junior developer tackles the challenge of regularly syncing order data to a data warehouse, evolving from naïve full‑table copies to incremental sync, batch processing, cursor‑based pagination, performance tuning, and finally a real‑time CDC‑plus‑message‑queue architecture, while addressing reliability, ordering, and scaling issues.

BatchCDCCursor
0 likes · 13 min read
From Full Sync to Real‑Time CDC: Building Scalable Order Data Pipelines
Ctrip Technology
Ctrip Technology
Nov 20, 2025 · Big Data

How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon

Ctrip transformed its traditional T+1 offline warehouse into a near‑real‑time lakehouse by integrating Flink CDC with Apache Paimon, designing a two‑stage CDC ingestion, optimizing performance, implementing dynamic updates, and deploying the solution across multiple business scenarios, achieving minute‑level latency, reduced costs, and faster data‑driven decisions.

CDCFlinkPaimon
0 likes · 27 min read
How Ctrip Achieved Minute‑Level Real‑Time Analytics with Flink CDC & Apache Paimon
dbaplus Community
dbaplus Community
Nov 12, 2025 · Databases

Mastering Data Sync: From Full Loads to Real‑Time CDC in E‑Commerce

This guide walks a new e‑commerce developer through the evolution of order data synchronization—from naïve full‑table loads, through incremental and batch strategies, cursor‑based pagination, performance tuning, and finally to real‑time CDC with message queues—highlighting pitfalls and practical solutions.

Batch ProcessingCDCMessage Queue
0 likes · 12 min read
Mastering Data Sync: From Full Loads to Real‑Time CDC in E‑Commerce
StarRocks
StarRocks
Jul 1, 2025 · Big Data

How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics

Suixingfu rebuilt its payment data pipeline by replacing a fragmented Lambda stack with a unified Porter CDC + StarRocks + Elasticsearch architecture, achieving three‑fold query speed, ten‑fold analytics efficiency, 20% storage reduction, and sub‑second data‑capture latency across high‑concurrency, ad‑hoc, and batch workloads.

CDCFlinkReal-time analytics
0 likes · 14 min read
How StarRocks Boosted Suixingfu’s Real‑Time Data Platform: 3× Faster Queries & 10× Faster Analytics
Big Data Technology Tribe
Big Data Technology Tribe
Jun 22, 2025 · Cloud Native

How to Ensure Consistent State in Event‑Driven Microservices: 3 Proven Patterns

This article explains the challenges of maintaining data consistency in distributed, event‑driven microservice architectures and introduces three practical patterns—Outbox, Original Event Handling, and Self‑Read—to guarantee reliable state synchronization across services, even when failures occur.

CDCDistributed TransactionsEvent-Driven Architecture
0 likes · 6 min read
How to Ensure Consistent State in Event‑Driven Microservices: 3 Proven Patterns
DataFunSummit
DataFunSummit
Apr 1, 2025 · Big Data

Understanding Flink CDC 3.3: Features, Improvements, and Future Plans

This article provides a comprehensive overview of Flink CDC 3.3, detailing its CDC fundamentals, new connectors, Transform module enhancements, asynchronous snapshot splitting, community adoption, and upcoming roadmap for broader ecosystem support and batch‑mode execution.

Big DataCDCChange Data Capture
0 likes · 15 min read
Understanding Flink CDC 3.3: Features, Improvements, and Future Plans
Big Data Technology Architecture
Big Data Technology Architecture
Mar 1, 2025 · Big Data

Core Principles and Practical Guide to Flink CDC

This article explains CDC fundamentals, details Flink CDC's architecture and advantages, provides setup steps, code examples for SQL and DataStream APIs, discusses performance tuning, consistency, common issues, and typical real‑time data integration scenarios.

CDCChange Data CaptureDebezium
0 likes · 7 min read
Core Principles and Practical Guide to Flink CDC
DataFunSummit
DataFunSummit
Feb 24, 2025 · Big Data

Building Real-Time Data Synchronization Pipelines with Apache SeaTunnel

Apache SeaTunnel is an open‑source, distributed data integration platform that enables efficient real‑time data synchronization across diverse sources and destinations, supporting both streaming and batch processing, with detailed architecture, connector plugins, CDC handling, transform capabilities, and deployment strategies for large‑scale data pipelines.

Apache SeaTunnelCDCReal-Time Data Integration
0 likes · 34 min read
Building Real-Time Data Synchronization Pipelines with Apache SeaTunnel
macrozheng
macrozheng
Feb 24, 2025 · Databases

Mastering MySQL to Elasticsearch Sync: 4 Strategies & Top Migration Tools

This article explores four practical methods for synchronizing MySQL data to Elasticsearch—including synchronous and asynchronous double writes, SQL extraction, and binlog real‑time replication—while reviewing popular migration tools such as Canal, Alibaba DTS, and Databus to help you choose the right solution.

CDCCanalDTS
0 likes · 13 min read
Mastering MySQL to Elasticsearch Sync: 4 Strategies & Top Migration Tools
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 27, 2025 · Big Data

Unlock Real-Time Data Sync with Flink CDC: YAML Integration, Transform & Route Explained

This article summarizes an advanced Flink CDC presentation, covering Flink CDC fundamentals, real‑time Flink integration, CDC‑YAML core capabilities, supported sync links, Transform and Route modules, monitoring metrics, schema‑change strategies, typical use cases, performance optimizations, demo implementations, and future development plans.

CDCData IntegrationFlink
0 likes · 20 min read
Unlock Real-Time Data Sync with Flink CDC: YAML Integration, Transform & Route Explained
Tencent Advertising Technology
Tencent Advertising Technology
Dec 6, 2024 · Big Data

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Tencent's advertising team replaced a traditional HDFS‑Hive warehouse with an Apache Iceberg‑based data lake, adding primary‑key tables, multi‑stream merging, adaptive compaction, and Spark SPJ optimizations to achieve minute‑level feature update latency, 10× back‑fill speed, and up to 60% storage savings.

Big DataCDCData Lake
0 likes · 25 min read
Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent
Su San Talks Tech
Su San Talks Tech
Jul 26, 2024 · Databases

Mastering MySQL‑to‑Elasticsearch Sync: 4 Strategies & Top Migration Tools

This guide compares four MySQL‑to‑Elasticsearch synchronization methods—synchronous dual‑write, asynchronous MQ‑based dual‑write, timer‑driven SQL extraction, and real‑time Binlog replication—and reviews popular CDC tools such as Canal, Alibaba Cloud DTS, Databus, and others to help you choose the right solution.

BinlogCDCCanal
0 likes · 13 min read
Mastering MySQL‑to‑Elasticsearch Sync: 4 Strategies & Top Migration Tools
IT Services Circle
IT Services Circle
Jun 12, 2024 · Databases

MySQL to Elasticsearch Data Synchronization: Strategies and Tool Selection

This article reviews four common MySQL‑to‑Elasticsearch synchronization methods—synchronous dual‑write, asynchronous dual‑write via MQ, timer‑based SQL extraction, and real‑time Binlog replication—evaluates their pros and cons, and compares popular migration tools such as Canal, Alibaba DTS, Databus and others.

BinlogCDCData Migration Tools
0 likes · 11 min read
MySQL to Elasticsearch Data Synchronization: Strategies and Tool Selection
Su San Talks Tech
Su San Talks Tech
Jun 10, 2024 · Databases

Mastering MySQL‑to‑Elasticsearch Sync: 4 Strategies & Top Migration Tools

This article compares four MySQL‑to‑Elasticsearch synchronization methods—synchronous dual‑write, asynchronous dual‑write, SQL extraction, and Binlog‑based real‑time sync—evaluates their pros and cons, and reviews popular migration tools such as Canal, Alibaba DTS, Databus, Flink, CloudCanal, Maxwell, and DRDS.

BinlogCDCElasticsearch
0 likes · 14 min read
Mastering MySQL‑to‑Elasticsearch Sync: 4 Strategies & Top Migration Tools
DataFunTalk
DataFunTalk
May 16, 2024 · Big Data

Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon

This article presents UCloud's USDP‑based streaming data lake warehouse solution that leverages Flink for real‑time processing and Paimon for lake storage, detailing its architecture, advantages, practical scenarios, and providing complete SQL and Flink CDC code snippets for end‑to‑end implementation.

CDCData LakeFlink
0 likes · 27 min read
Streaming Data Lake Warehouse Solution Based on USDP with Flink and Paimon
DataFunSummit
DataFunSummit
Mar 25, 2024 · Big Data

Exploring Real-Time Data Lake Practices at Kangaroo Cloud

This article shares Kangaroo Cloud's exploration and practice of a real-time data lake, covering background, data lake concepts, challenges, solution architecture using the Shuzhan platform with Iceberg/Hudi, CDC ingestion, small file handling, cross-cluster ingestion, materialized view acceleration, and future development plans.

CDCCross-Cluster IngestionHudi
0 likes · 12 min read
Exploring Real-Time Data Lake Practices at Kangaroo Cloud
DataFunSummit
DataFunSummit
Feb 20, 2024 · Big Data

BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook

This article introduces ByteDance's open‑source data integration engine BitSail, covering its background, layered architecture, recent feature enhancements, automated testing framework, CDC‑based full‑library synchronization solutions, and future development plans for connectors and real‑time data consistency.

Big DataCDCData Integration
0 likes · 12 min read
BitSail Open‑Source Data Integration Engine: Architecture, New Features, CDC Solutions and Future Outlook
ITPUB
ITPUB
Dec 24, 2023 · Backend Development

Why Kafka Is the Backbone of Modern Messaging, Streaming, and Data Pipelines

This article explains how Kafka serves as a high‑throughput, durable messaging system, a reliable storage layer, a log‑aggregation hub, a stream‑processing engine, and a core component for CDC, system migration, monitoring, and event‑sourcing architectures.

CDCEvent SourcingKafka
0 likes · 9 min read
Why Kafka Is the Backbone of Modern Messaging, Streaming, and Data Pipelines
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 28, 2023 · Big Data

Apache Paimon for CDC: Low‑Cost, Low‑Latency Data Lake Ingestion and Performance Comparison with Hive and Hudi

This article explains how Apache Paimon simplifies CDC data lake ingestion with one‑click, low‑cost, low‑latency pipelines, details its architecture and tag‑based Hive compatibility, provides best‑practice configurations, and presents benchmark results showing Paimon outperforming Hive and Hudi in both write and query performance.

Apache PaimonCDCData Lake
0 likes · 14 min read
Apache Paimon for CDC: Low‑Cost, Low‑Latency Data Lake Ingestion and Performance Comparison with Hive and Hudi
Alibaba Cloud Native
Alibaba Cloud Native
Nov 23, 2023 · Cloud Native

How CDC + Serverless Functions Enable Real‑Time ETL in Cloud Native Architectures

This article explains how Alibaba Cloud's Serverless Function Compute combined with Database Change Data Capture (CDC) creates a complete, real‑time ETL pipeline, detailing the ETL model, DTS integration, architecture components, event‑driven processing, and practical use cases such as OLTP‑to‑OLAP data flow.

Alibaba CloudCDCData Integration
0 likes · 10 min read
How CDC + Serverless Functions Enable Real‑Time ETL in Cloud Native Architectures
Java High-Performance Architecture
Java High-Performance Architecture
Sep 28, 2023 · Databases

How to Use Debezium for MySQL CDC in Spring Boot Without Adding Extra Middleware

Learn how to capture MySQL data changes using Debezium's CDC capabilities within a Spring Boot application, avoiding heavyweight message brokers by leveraging binlog monitoring, configuring connectors, handling snapshots, and processing change events for use cases like cache invalidation, data integration, and simplifying monolithic architectures.

CDCData IntegrationDebezium
0 likes · 24 min read
How to Use Debezium for MySQL CDC in Spring Boot Without Adding Extra Middleware
dbaplus Community
dbaplus Community
Sep 24, 2023 · Backend Development

How to Sync MySQL Binlog to Elasticsearch Using Canal and RocketMQ

Learn step‑by‑step how to configure Alibaba’s open‑source Canal to capture MySQL binlog changes, route them through RocketMQ, and index the data into Elasticsearch, covering cluster mode, MySQL and Elasticsearch setup, Canal properties, and consumer implementation details.

CDCCanalRocketMQ
0 likes · 9 min read
How to Sync MySQL Binlog to Elasticsearch Using Canal and RocketMQ
Java Interview Crash Guide
Java Interview Crash Guide
Aug 14, 2023 · Big Data

Unlocking Change Data Capture with Debezium in Spring Boot – No Extra Middleware Needed

This article explains how small web projects can avoid heavyweight message middleware by using CDC technology, specifically Debezium, to monitor MySQL binlog changes, outlines why Debezium outperforms alternatives like Canal, and provides step‑by‑step Spring Boot integration with configuration, code samples, and practical use‑case scenarios.

CDCChange Data CaptureDebezium
0 likes · 22 min read
Unlocking Change Data Capture with Debezium in Spring Boot – No Extra Middleware Needed
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 4, 2023 · Big Data

Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics

This article presents a step‑by‑step guide on how the logistics provider Haicheng Bangda implemented a streaming data warehouse using Paimon, Flink CDC, and Kubernetes, covering business background, architecture choices, environment setup, SQL examples, troubleshooting tips, and future roadmap for their digital transformation.

Big DataCDCFlink
0 likes · 27 min read
Building a Real‑Time Streaming Data Warehouse with Paimon on Kubernetes for Supply‑Chain Logistics
DataFunSummit
DataFunSummit
May 28, 2023 · Big Data

Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook

This article introduces Apache Hudi as a next‑generation streaming data‑lake platform, explains its core concepts, architecture, and table types, and showcases real‑world use cases at Tencent such as CDC ingestion, minute‑level real‑time warehousing, streaming analytics, multi‑stream joins, ad attribution, and stream‑to‑batch processing, while also outlining future directions.

Apache HudiCDCData Lake
0 likes · 16 min read
Apache Hudi: Capabilities, Architecture, Use Cases, and Future Outlook
ITPUB
ITPUB
Apr 26, 2023 · Databases

Mastering Change Data Capture: Open‑Source Tools and How to Choose the Right One

This article explains the concept of Change Data Capture (CDC), outlines its common use cases, compares the main technical approaches—including timestamps, data diff, triggers, and log‑based methods—and reviews popular open‑source CDC solutions and their database‑specific configuration requirements.

CDCChange Data CaptureData Integration
0 likes · 15 min read
Mastering Change Data Capture: Open‑Source Tools and How to Choose the Right One
TAL Education Technology
TAL Education Technology
Feb 16, 2023 · Big Data

Step‑by‑Step Guide to Syncing Canal Data to Elasticsearch

This article provides a comprehensive, hands‑on tutorial for configuring Alibaba Canal and its client‑adapter to capture MySQL binlog changes and synchronize them into Elasticsearch, covering environment setup, Docker commands, YAML configuration files, index mapping, adapter startup, and common troubleshooting scenarios.

CDCCanalConfiguration
0 likes · 26 min read
Step‑by‑Step Guide to Syncing Canal Data to Elasticsearch
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 28, 2022 · Big Data

Flink 1.16 Highlights: Adaptive Batch Scheduling, Speculative Execution, Hybrid Shuffle, Dynamic Partition Pruning, Hive SQL Migration, Checkpoint Enhancements, CDC Integration, and Table Store

Flink 1.16 introduces adaptive batch scheduling, speculative execution, hybrid shuffle, dynamic partition pruning, improved Hive SQL compatibility, advanced checkpoint mechanisms including changelog backend, and integrates CDC with Kafka and Table Store, offering faster, more stable, and easier-to-use stream‑batch processing capabilities.

Big DataCDCCheckpoint
0 likes · 8 min read
Flink 1.16 Highlights: Adaptive Batch Scheduling, Speculative Execution, Hybrid Shuffle, Dynamic Partition Pruning, Hive SQL Migration, Checkpoint Enhancements, CDC Integration, and Table Store
ITPUB
ITPUB
Dec 18, 2022 · Big Data

How to Build a Real‑Time Data Warehouse with EasyData: A Step‑by‑Step Guide

Learn how to design and implement a real‑time data warehouse for an app’s AB‑test monitoring using EasyData, covering data flow layers, CDC task creation, stream table registration, Flink SQL processing, and BI reporting, with detailed steps, code snippets, and practical tips.

CDCEasyDataFlink
0 likes · 13 min read
How to Build a Real‑Time Data Warehouse with EasyData: A Step‑by‑Step Guide
DataFunSummit
DataFunSummit
Dec 2, 2022 · Big Data

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

BitSail, ByteDance’s open‑source data integration engine, unifies batch, streaming, and incremental data synchronization across heterogeneous sources, detailing its evolution from early Flink‑based prototypes to a mature, plugin‑driven architecture with multi‑engine support, low‑cost co‑development, and robust CDC lakehouse capabilities.

Big DataCDCFlink
0 likes · 19 min read
BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities
DataFunTalk
DataFunTalk
Nov 6, 2022 · Big Data

BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities

BitSail, an open‑source data integration engine from ByteDance, provides a unified solution for batch, streaming, full‑load, and incremental data synchronization across heterogeneous sources, detailing its background, technical evolution, architecture, low‑cost co‑building features, compatibility strategies, and future roadmap.

CDCData IntegrationFlink
0 likes · 18 min read
BitSail: ByteDance’s Open‑Source Unified Data Integration Engine – Architecture, Evolution, and Capabilities
IT Services Circle
IT Services Circle
Oct 26, 2022 · Databases

Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide

This article introduces Debezium, an open‑source low‑latency change data capture platform that streams database row changes via Kafka, explains its architecture and common scenarios such as cache invalidation and CQRS, and provides step‑by‑step Docker commands to install ZooKeeper, Kafka, MySQL and the Debezium connector.

CDCData IntegrationDebezium
0 likes · 15 min read
Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide
DataFunSummit
DataFunSummit
Oct 21, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform architecture and three real‑time lake initiatives—log ingestion, CDC ingestion, and lake analysis—showcasing how Apache Iceberg, Flink, and custom shuffling algorithms solve small‑file and cross‑cloud challenges while enabling schema evolution and future multi‑cloud optimizations.

Apache IcebergBig DataCDC
0 likes · 16 min read
Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg
Alibaba Cloud Native
Alibaba Cloud Native
Sep 29, 2022 · Cloud Native

Why Use RocketMQ Connect for Scalable Data Pipelines?

This article explains the challenges of point‑to‑point data sync, introduces RocketMQ Connect as a cloud‑native solution that decouples upstream and downstream, details its architecture, connectors, REST API, metrics, deployment modes, and provides a step‑by‑step guide to building custom connectors for use cases such as CDC, data lakes, and system migration.

CDCCloud NativeConnector
0 likes · 19 min read
Why Use RocketMQ Connect for Scalable Data Pipelines?
DataFunTalk
DataFunTalk
Aug 6, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform engineering, describing how Apache Iceberg is leveraged for real‑time data lake ingestion, CDC pipelines, multi‑cloud storage, small‑file mitigation, schema evolution, and future plans across storage, compute, and management within a big‑data ecosystem.

Apache IcebergCDCFlink
0 likes · 16 min read
Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg
Efficient Ops
Efficient Ops
Jul 19, 2022 · Databases

How CDC Powers Real-Time Analytics Without Overloading Your Database

This article introduces the practice of Change Data Capture (CDC), explaining how capturing only data changes can feed downstream systems and data warehouses in near real‑time, reducing load on the source database, improving reporting latency, and supporting scalable, reliable analytics pipelines.

CDCChange Data CaptureReal-time analytics
0 likes · 9 min read
How CDC Powers Real-Time Analytics Without Overloading Your Database
Alibaba Cloud Native
Alibaba Cloud Native
Jul 17, 2022 · Cloud Native

Build Real-Time CDC Pipelines on Alibaba Cloud EventBridge with DTS

This article explains Change Data Capture (CDC) concepts, compares open‑source CDC tools, and shows how to leverage Alibaba Cloud EventBridge and DTS to build real‑time CDC pipelines, covering setup steps, event‑bus vs event‑stream choices, best‑practice scenarios such as CQRS, microservice decoupling, database backup, and SQL auditing.

CDCCloud NativeDTS
0 likes · 12 min read
Build Real-Time CDC Pipelines on Alibaba Cloud EventBridge with DTS
Efficient Ops
Efficient Ops
Jul 6, 2022 · Databases

How DataBus Enables Real-Time, Scalable Database Synchronization for Oracle Migration

DataBus is a real‑time data synchronization framework designed to support Oracle de‑commissioning, micro‑service migration, and heterogeneous storage engines by providing high‑availability CDC, flexible data pipelines, and seamless full‑to‑incremental migration across multiple source and target databases.

CDCdata synchronizationdatabase migration
0 likes · 19 min read
How DataBus Enables Real-Time, Scalable Database Synchronization for Oracle Migration
Bilibili Tech
Bilibili Tech
Jun 10, 2022 · Big Data

Incremental Data Lake Design and Hudi Core Optimizations with Flink

The article describes how combining Apache Flink with Hudi enables an incremental data lake that delivers near‑real‑time analytics by switching to merge‑on‑read, fixing log handling bugs, improving compaction planning, and refactoring table‑service scheduling, while showcasing use cases such as CDC ingestion, data quality control, and real‑time materialized views, and outlines future enhancements like optimistic concurrency and unified schema evolution.

Apache HudiCDCCompaction Optimization
0 likes · 21 min read
Incremental Data Lake Design and Hudi Core Optimizations with Flink
IT Architects Alliance
IT Architects Alliance
Jun 7, 2022 · Databases

Introduction to Change Data Capture (CDC) Practices

This article introduces the concept and practice of Change Data Capture (CDC), explaining how it captures database changes to provide real‑time incremental data for analytics and reporting without impacting source performance, and outlines modern CDC methods, challenges, and production‑ready system requirements.

CDCChange Data CaptureData Integration
0 likes · 8 min read
Introduction to Change Data Capture (CDC) Practices
Top Architect
Top Architect
Jun 7, 2022 · Databases

An Introduction to Change Data Capture (CDC) Practices and Modern Approaches

This article introduces the concept of Change Data Capture (CDC), explains why traditional batch reporting strains resources, describes how CDC captures only data changes to keep source databases performant, and outlines modern CDC architectures, production‑ready considerations, and best‑practice guidelines for building reliable data pipelines.

CDCChange Data CaptureData Integration
0 likes · 16 min read
An Introduction to Change Data Capture (CDC) Practices and Modern Approaches
DataFunTalk
DataFunTalk
May 24, 2022 · Big Data

Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake

This article explains how Apache Flink integrates with Apache Hudi to enable real‑time data lake ingestion, covering the evolution from traditional data warehouses to data lakes, Hudi’s core concepts such as timeline and file grouping, copy‑on‑write vs merge‑on‑read modes, and Flink’s CDC‑based ETL pipeline.

Big DataCDCData Lake
0 likes · 18 min read
Integrating Apache Flink with Apache Hudi: From Data Warehouse to Data Lake
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2022 · Big Data

Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions

This article introduces Delta Lake as an open‑source storage layer for lake‑house architectures, explains its key features, file and metadata structures, and details how Alibaba Cloud EMR and Data Lake Formation integrate and extend Delta Lake with advanced capabilities such as G‑SCD, CDC, performance optimizations, and future roadmap.

CDCDLFDelta Lake
0 likes · 10 min read
Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions
Alibaba Cloud Developer
Alibaba Cloud Developer
May 13, 2022 · Big Data

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

Delta Lake, an open‑source storage layer from Databricks, provides ACID transactions, data versioning, schema evolution, and unified batch‑stream processing, with a detailed file structure and metadata mechanism, while Alibaba Cloud EMR enhances it with advanced DML, performance optimizations, deep DLF integration, and solutions for G‑SCD and CDC.

CDCDLFData Lakehouse
0 likes · 11 min read
Unlocking Delta Lake: Key Features, Architecture, and EMR Integration
IT Architects Alliance
IT Architects Alliance
May 11, 2022 · Databases

How Change Data Capture Enables Real‑Time Analytics Without Overloading Your Database

The article explains the fundamentals of Change Data Capture (CDC), describing how capturing DML changes from relational databases like MySQL or PostgreSQL can provide incremental, near‑real‑time data for analytics and reporting while preserving source performance, and outlines modern CDC architectures, transaction‑log based extraction, and production‑ready design considerations.

CDCChange Data CaptureDatabase Replication
0 likes · 9 min read
How Change Data Capture Enables Real‑Time Analytics Without Overloading Your Database
Top Architect
Top Architect
May 11, 2022 · Databases

An Introduction to Change Data Capture (CDC) Practices

This article introduces the concept and practice of Change Data Capture (CDC), explaining why CDC is needed for real‑time analytics, how it works by capturing DML changes, modern approaches using transaction logs, and key considerations for building a production‑ready CDC system.

CDCChange Data CaptureData Integration
0 likes · 8 min read
An Introduction to Change Data Capture (CDC) Practices
Shopee Tech Team
Shopee Tech Team
Mar 17, 2022 · Backend Development

Real-time Checking System for Data Consistency in Microservices

Shopee’s Real‑time Checking System provides configurable, non‑intrusive data consistency verification for micro‑services by capturing change events via CDC, streaming them through Kafka, applying flexible rules and expressions, and instantly alerting mismatches, delivering second‑level detection while scaling to tens of thousands of checks per second.

CDCData ConsistencyDistributed Systems
0 likes · 20 min read
Real-time Checking System for Data Consistency in Microservices
Volcano Engine Developer Services
Volcano Engine Developer Services
Feb 16, 2022 · Big Data

ByteDance’s Journey to a Unified Data Lake with Flink and Hudi

This article recounts ByteDance’s evolution from batch‑only Flink pipelines to a unified data‑lake integration platform, detailing the three integration modes, challenges with Spark‑based CDC, the decision to adopt Hudi over Iceberg, and how Hudi’s indexing and Merge‑On‑Read formats enable near‑real‑time analytics at massive scale.

CDCFlinkHudi
0 likes · 10 min read
ByteDance’s Journey to a Unified Data Lake with Flink and Hudi
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 16, 2022 · Big Data

Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse

This article introduces Change Data Capture (CDC), compares query‑based and log‑based approaches, explains Debezium and ClickHouse, and provides detailed Flink CDC and Flink SQL CDC examples—including Java source code, custom deserialization schema, ClickHouse sink implementation, and required Maven dependencies—to synchronize MySQL data into ClickHouse in real time.

Big DataCDCData Streaming
0 likes · 17 min read
Using Flink CDC to Capture MySQL Changes and Sync Them to ClickHouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 22, 2021 · Big Data

Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse

This article explains Change Data Capture (CDC), compares query‑based and log‑based approaches, introduces Debezium and ClickHouse, and provides step‑by‑step Flink CDC and Flink SQL CDC examples—including Java source, deserialization, sink code and required Maven dependencies—to stream MySQL binlog changes into ClickHouse for real‑time analytics.

Big DataCDCData Streaming
0 likes · 14 min read
Using Flink CDC to Capture MySQL Changes and Sink Them into ClickHouse
Big Data Technology Architecture
Big Data Technology Architecture
Nov 23, 2021 · Big Data

Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster

This comprehensive tutorial walks through configuring a Hadoop‑based environment (Flink 1.13.1, Scala 2.11, CDH 6.2.0, Hive 2.1.1, Hudi 0.10), compiling Hudi, setting up Flink and MySQL binlog, creating CDC source and Hudi sink tables, running Flink jobs, and synchronizing the results to Hive partitions for query via Hive and Presto.

CDCFlinkHudi
0 likes · 15 min read
Step-by-Step Guide to Setting Up Flink CDC with MySQL, Hudi, and Hive Integration on a Hadoop Cluster
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2021 · Big Data

Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough

This article introduces Flink CDC 2.0, explains its distributed full‑load and incremental reading mechanisms, details the slice partitioning, snapshot correction, and binlog handling logic, and provides a complete Java example that demonstrates how to configure Flink SQL, MySQL source, and Kafka sink.

Big DataCDCData Integration
0 likes · 29 min read
Understanding Flink CDC 2.0: Core Design, Snapshot & Incremental Reading, and Code Walkthrough
HomeTech
HomeTech
Nov 3, 2021 · Big Data

Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation

This article presents Car Home's experience building a real‑time materialized view system on Apache Flink, detailing system analysis, problem decomposition, a global‑version‑based CDC algorithm, its implementation as a Flink connector, practical deployment results, and remaining challenges such as clock dependency and state size.

CDCFlinkalgorithm
0 likes · 17 min read
Real‑time Materialized View Practices with Apache Flink: System Analysis, Algorithm Design, and Implementation
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 21, 2021 · Big Data

Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal

This article provides a detailed comparison of three popular open‑source change data capture tools—Debezium, Flink CDC, and Canal—covering their underlying principles, architecture, deployment options, performance characteristics, and suitability for real‑time data synchronization in big‑data environments.

CDCCanalChange Data Capture
0 likes · 21 min read
Comparative Overview of Open‑Source CDC Solutions: Debezium, Flink CDC, and Canal
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 9, 2021 · Big Data

How Apache Hudi & Pulsar Enable Real‑Time CDC Data Lake Ingestion

This article explains CDC fundamentals, compares query‑based and log‑based capture, describes typical CDC‑to‑lake architectures using Pulsar and Hudi, dives into Hudi's core design, optimization techniques, and future roadmap, and provides practical insights for building scalable data lakes.

Apache HudiCDCPulsar
0 likes · 17 min read
How Apache Hudi & Pulsar Enable Real‑Time CDC Data Lake Ingestion
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 2, 2021 · Databases

From LAMP to Cloud‑Native: Evolving Application Data Architecture and Best Practices

This article traces two decades of application data architecture evolution, comparing traditional single‑system LAMP designs with modern multi‑component cloud‑native stacks, and offers practical guidance on scaling, component selection, CDC‑based data derivation, and cloud‑native implementations such as Tablestore.

CDCData Architecturedatabases
0 likes · 22 min read
From LAMP to Cloud‑Native: Evolving Application Data Architecture and Best Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Jul 20, 2021 · Big Data

Common Issues and Solutions for Flink CDC with MySQL

This article summarizes frequent problems encountered when using Flink CDC with MySQL—including Kafka version conflicts, checkpoint timeouts, permission errors, global lock issues, and DDL parsing failures—and provides practical configuration tweaks and code examples to resolve them.

CDCCheckpointDebezium
0 likes · 11 min read
Common Issues and Solutions for Flink CDC with MySQL
Programmer DD
Programmer DD
Jun 14, 2021 · Databases

Master Real‑Time Change Data Capture with Debezium and Spring Boot

Learn how to capture and stream real‑time database changes using Debezium’s distributed CDC framework, configure MySQL binlog, integrate the embedded engine with Spring Boot, and process change events with sample code and Docker setup for robust data pipelines.

CDCChange Data CaptureDebezium
0 likes · 11 min read
Master Real‑Time Change Data Capture with Debezium and Spring Boot
Full-Stack Internet Architecture
Full-Stack Internet Architecture
May 19, 2021 · Backend Development

Understanding Message Queues: Benefits, Design Challenges, and Transactional Solutions

This article explores the role of message queues in microservice architectures, discussing their advantages such as decoupling, asynchronous processing, and load shedding, while also addressing design challenges like concurrency, ordering, duplicate handling, and transactional messaging with solutions including Kafka partitions, outbox patterns, CDC, and RocketMQ.

CDCKafkaMessage Queue
0 likes · 12 min read
Understanding Message Queues: Benefits, Design Challenges, and Transactional Solutions
Top Architect
Top Architect
May 4, 2021 · Big Data

Overview of CDC Tools: Canal, Maxwell, Databus, and Alibaba DTS

This article introduces four change‑data‑capture solutions—Canal, Maxwell, Databus, and Alibaba Data Transmission Service (DTS)—explaining their principles, processing steps, features, and practical advantages for real‑time data synchronization and migration in big‑data environments.

Alibaba DTSBig DataCDC
0 likes · 6 min read
Overview of CDC Tools: Canal, Maxwell, Databus, and Alibaba DTS
DataFunTalk
DataFunTalk
Apr 27, 2021 · Big Data

Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System

This article describes how Linkflow migrated mutable customer data from MySQL to an Apache Hudi data lake using Debezium‑in‑Flink CDC, addressing challenges such as snapshot resumability, partial updates, row‑key merging, schema evolution, indexing, and concurrent writes to achieve minute‑level data freshness and improved offline processing performance.

Apache HudiBig DataCDC
0 likes · 21 min read
Implementing CDC‑to‑Hudi for Real‑Time Mutable Data in a Big Data System
Ctrip Technology
Ctrip Technology
Mar 25, 2021 · Big Data

Challenges and Approaches for Real‑Time Data Aggregation Analysis

The article examines the key challenges of real‑time data aggregation—data freshness, timely processing, and result visibility—and surveys common solutions such as timestamp‑based sync, CDC, full and incremental computation, storage formats, and trigger mechanisms.

Big DataCDCIncremental Computation
0 likes · 11 min read
Challenges and Approaches for Real‑Time Data Aggregation Analysis
JD Retail Technology
JD Retail Technology
Mar 12, 2021 · Backend Development

Cache Synchronization in High‑Concurrency Environments: Problems and JD's CDC‑Based Solution

The article reviews common cache‑side data‑sync patterns, highlights their inconsistency and data‑loss risks under high load, and presents JD's solution that combines Cache‑Aside, Change Data Capture, message queues, delayed consumption, versioning, and persistence to ensure eventual consistency between cache and relational databases.

CDCCacheData Consistency
0 likes · 7 min read
Cache Synchronization in High‑Concurrency Environments: Problems and JD's CDC‑Based Solution