Mastering MySQL‑to‑Elasticsearch Sync: 4 Strategies & Top Migration Tools
This article compares four MySQL‑to‑Elasticsearch synchronization methods—synchronous dual‑write, asynchronous dual‑write, SQL extraction, and Binlog‑based real‑time sync—evaluates their pros and cons, and reviews popular migration tools such as Canal, Alibaba DTS, Databus, Flink, CloudCanal, Maxwell, and DRDS.
1. Introduction
In many projects MySQL serves as the primary business database while Elasticsearch (ES) is used as a query engine to achieve read‑write separation and handle massive, complex queries.
A critical challenge is keeping MySQL and ES data synchronized. This article outlines four common synchronization approaches and introduces widely used migration tools.
2. Data Synchronization Solutions
2.1 Synchronous Dual‑Write
The simplest method: write to MySQL and ES simultaneously.
Advantages:
Simple business logic
High real‑time performance
Disadvantages:
Hard‑coded; every MySQL write must also contain ES write code
Tightly coupled business logic
Risk of data loss if one write fails
Performance degradation as ES adds load to the system
2.2 Asynchronous Dual‑Write
For multiple data sources, a message queue (MQ) can be used to perform asynchronous writes.
Advantages:
High performance
Reduced data loss risk thanks to MQ consumption guarantees (e.g., retry on ES failure)
Isolation between data sources, facilitating future extensions
Disadvantages:
Hard‑coded integration; new sources require new consumer code
Increased system complexity due to the message middleware
Potential latency because MQ consumption is asynchronous
2.3 SQL Extraction (Timer‑Based)
When real‑time is not critical, a scheduled job can extract changed rows based on a timestamp column.
Add a timestamp column to relevant tables; any CRUD operation updates this field.
Leave existing application code unchanged.
Run a timer program that periodically scans the tables for rows whose timestamp changed within the interval.
Write the extracted rows to ES one by one.
Advantages:
No code intrusion; original application remains untouched
No strong business coupling; system performance unchanged
Simple worker implementation without CRUD considerations
Disadvantages:
Lower timeliness; synchronization delay depends on the timer interval
Additional polling load on the database; can be mitigated by using a replica for polling
Classic solution: use Logstash to periodically query MySQL via SQL and write incremental data to ES.
2.4 Binlog‑Based Real‑Time Sync
To achieve real‑time synchronization without code intrusion, MySQL's binary log (Binlog) can be leveraged.
Steps:
Read Binlog entries for the target tables.
Convert the entries into MQ messages.
Implement an MQ consumer.
Continuously consume messages and write each to ES.
Advantages:
No code intrusion or hard‑coding.
No changes required in the existing system.
High performance.
Business decoupling; no need to understand original business logic.
Disadvantages:
Complexity of building a Binlog processing system.
If MQ is used for delivery, it inherits the latency risk of asynchronous consumption.
3. Data Migration Tool Selection
The Binlog real‑time sync approach is the most common, spawning many migration tools that subscribe to Binlog changes and implement CDC (Change Data Capture) to forward INSERT, UPDATE, DELETE events downstream.
3.1 Canal
Canal parses incremental database logs and provides subscription/consumption capabilities, primarily for MySQL.
It mimics a MySQL slave to receive Binlog from the master, parses the binary data into JSON, and forwards it via TCP or MQ to downstream consumers.
Canal server requests a dump protocol from MySQL master.
Master pushes Binlog to Canal, which converts it to JSON.
Canal client listens via TCP or MQ and syncs data to ES.
The core flow includes a Binlog parser for extraction and an EventSink for filtering, routing, and processing.
3.2 Alibaba Cloud DTS
Data Transmission Service (DTS) supports data migration, real‑time subscription, and synchronization across RDBMS, NoSQL, and OLAP sources.
Features include multi‑source support, multiple transmission modes, high performance (up to 70 MB/s and 200 k TPS), high availability via clustered nodes, and a visual management console. It is a paid service.
3.3 Databus
Databus, open‑sourced by LinkedIn in 2013, is a low‑latency, reliable, transactional change‑capture system supporting multiple data sources such as MySQL and Oracle.
It pulls changes from database logs, delivers them to custom clients in real time, and offers features like unlimited replay, high scalability, and server‑side filtering.
3.4 Other Tools
Flink : Distributed stream processing engine for stateful computations on bounded and unbounded data streams.
CloudCanal : Commercial data synchronization product.
Maxwell : Simple tool that outputs database changes as JSON strings without requiring custom client code.
DRDS : Alibaba's distributed relational database middleware focusing on scalability, lightweight, flexible, stable, and efficient characteristics.
Yugong : Tool for migrating data from Oracle to MySQL.
4. Conclusion
This article presented MySQL‑to‑Elasticsearch synchronization strategies and surveyed common migration tools, helping readers choose the most suitable solution for their scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
