Databases 13 min read

Mastering MySQL to Elasticsearch Sync: 4 Strategies & Top Migration Tools

This article explores four practical methods for synchronizing MySQL data to Elasticsearch—including synchronous and asynchronous double writes, SQL extraction, and binlog real‑time replication—while reviewing popular migration tools such as Canal, Alibaba DTS, and Databus to help you choose the right solution.

macrozheng

Feb 24, 2025

Mastering MySQL to Elasticsearch Sync: 4 Strategies & Top Migration Tools

1. Introduction

In many projects MySQL serves as the primary business database while Elasticsearch is used for query operations to achieve read‑write separation, reduce MySQL query load, and handle complex large‑scale searches. A key challenge is keeping MySQL and Elasticsearch data synchronized.

2. Data Synchronization Solutions

2.1 Synchronous Double Write

The simplest approach writes data to both MySQL and Elasticsearch simultaneously.

Advantages: simple business logic and high real‑time performance.

Business logic is simple.

High real‑time capability.

Disadvantages: hard‑coded, strong coupling, risk of data loss on double‑write failure, and performance degradation.

Requires code changes wherever MySQL is written.

Strong coupling between services.

Potential data loss if one write fails.

Overall performance suffers.

2.2 Asynchronous Double Write

For multiple data sources, a message queue (MQ) can be used to write asynchronously.

Advantages: high performance, reduced data loss thanks to MQ consumption guarantees, and easy extension to additional sources.

High performance.

MQ ensures retry on ES downtime or write failures.

Isolated writes allow easy addition of new sources.

Disadvantages: hard‑coded integration, increased system complexity, and potential latency due to asynchronous processing.

New sources require new consumer code.

Introduces message‑queue overhead.

Data may not be visible immediately.

2.3 SQL Extraction (Timer‑Based)

This method adds a timestamp column to relevant tables, uses a scheduled job to poll changed rows, and writes them to Elasticsearch, avoiding code intrusion.

Add a timestamp column to tables.

Leave existing CRUD operations unchanged.

Run a periodic timer to scan for changes.

Write each changed row to Elasticsearch.

Advantages: no code changes, no strong coupling, and simple worker implementation.

No intrusion into existing code.

No performance impact on the business system.

Worker logic is straightforward.

Disadvantages: lower timeliness due to polling intervals and added load on the database.

Potential delay even with second‑level intervals.

Polling can stress the DB; using a replica can mitigate.

Classic solution: use Logstash to periodically query new data via SQL and write incremental changes to Elasticsearch.

2.4 Binlog Real‑Time Synchronization

To achieve real‑time sync without code intrusion, MySQL binlog can be leveraged.

Steps:

Read MySQL binlog to obtain table change events.

Convert events to MQ messages.

Develop an MQ consumer.

Consume messages and write each change to Elasticsearch.

Advantages: no code changes, high performance, and business decoupling.

No intrusion or hard‑coding.

Existing systems remain unchanged.

High throughput.

Business logic is independent of sync process.

Disadvantages: complex binlog setup and possible MQ latency.

Building a binlog‑based system is intricate.

If MQ is used, it inherits MQ’s delay risk.

3. Data Migration Tool Selection

Among the four sync methods, binlog real‑time sync is most common and has spawned several robust migration tools that capture MySQL binlog changes (CDC) and forward INSERT, UPDATE, DELETE events downstream.

3.1 Canal

Canal acts as a MySQL slave to subscribe to binlog, parse it into JSON, and deliver changes via TCP or MQ.

Canal server requests dump protocol from MySQL master.

Master pushes binlog; Canal parses bytes to JSON.

Canal client listens via TCP or MQ and syncs data to Elasticsearch.

3.2 Alibaba Cloud DTS

Data Transmission Service supports multiple source types (RDBMS, NoSQL, OLAP) and offers data migration, real‑time subscription, and synchronization with high performance (up to 70 MB/s, 200 k TPS) and high availability.

Supports many data sources.

Multiple transmission modes.

High throughput and TPS.

Clustered architecture ensures failover.

Visual UI and wizard‑based link creation.

Commercial service (paid).

3.3 Databus

Databus, an open‑source LinkedIn project, captures low‑latency, reliable, transactional changes from databases (MySQL, Oracle) and delivers them to consumers.

Supports multiple source types.

Highly scalable and available.

Preserves transaction order.

Millisecond‑level latency with flexible subscription.

Unlimited replay for consumers.

4. Conclusion

This article presented various MySQL‑to‑Elasticsearch synchronization approaches and introduced several popular data migration tools, helping you select the most suitable solution for your project.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch mysql Canal DTS data-sync CDC Databus

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.