Overview of Canal, Maxwell, Databus, and Alibaba Cloud DTS for MySQL Binlog‑Based Change Data Capture
This article introduces several MySQL binlog-based change data capture solutions—including Canal, Maxwell, Databus, and Alibaba Cloud's Data Transmission Service—explaining their principles, architecture, features, and usage considerations for incremental data subscription and processing.
Canal is a Java‑based open‑source project that simulates the MySQL slave protocol to capture binary‑log events for incremental data subscription and consumption, currently supporting MySQL.
Principle: Canal pretends to be a MySQL slave, sends a dump request to the master, receives binary log streams, and parses the byte‑level log into structured events.
Connection obtains the last successfully parsed position (or an initial position on first start).
Connection establishes a link and issues the BINLOG_DUMP command.
MySQL starts pushing the binary log.
The received binary log is parsed by a Binlog parser, enriching it with additional information.
The parsed events are handed to an EventSink module for storage, a blocking operation until the write succeeds.
After successful storage, the current binary‑log position is recorded periodically.
Additional capabilities include data filtering (wildcard‑based table/field filters), data routing/distribution (1:n parser‑to‑multiple stores), data merging (n:1 store aggregation), and data enrichment (e.g., joins) before persisting.
Maxwell, also written in Java, consists of a server and client; it provides a stable, feature‑rich CDC solution but requires developers to write a client to consume its output. Its main advantage over Canal is simplicity: Maxwell directly emits data‑change events as JSON strings, eliminating the need for custom client code.
Databus is a low‑latency change‑capture system used extensively in LinkedIn’s data pipelines. It offers isolation between sources and consumers, guarantees ordered and at‑least‑once delivery with high availability, supports consumption from any point in the change stream (including full back‑fill), provides partitioned consumption, and ensures source‑consistent persistence.
Alibaba Cloud Data Transmission Service (DTS) is a managed data‑flow service that supports RDBMS, NoSQL, and OLAP sources. It provides data migration, real‑time subscription, and synchronization capabilities, enabling scenarios such as zero‑downtime migration, cross‑region disaster recovery, active‑active architectures, and real‑time data warehousing. Compared with third‑party tools, DTS offers richer transmission links, higher performance, stronger security, and convenient management features.
In practice, DTS behaves like a message queue that pushes wrapped SQL objects; users can build services to parse these objects. The service also handles costly deployment and maintenance, offers optimized support for Alibaba Cloud RDS and DRDS, and resolves binlog retention, primary‑secondary switch, and VPC network changes with high availability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
