Databases 11 min read

MySQL to Elasticsearch Data Synchronization: Strategies and Tool Selection

This article reviews four common MySQL‑to‑Elasticsearch synchronization methods—synchronous dual‑write, asynchronous dual‑write via MQ, timer‑based SQL extraction, and real‑time Binlog replication—evaluates their pros and cons, and compares popular migration tools such as Canal, Alibaba DTS, Databus and others.

IT Services Circle

Jun 12, 2024

MySQL to Elasticsearch Data Synchronization: Strategies and Tool Selection

1. Introduction

In many projects MySQL serves as the primary business database while Elasticsearch is used as a query engine to achieve read‑write separation and handle large‑scale complex queries. A critical challenge is keeping MySQL and Elasticsearch data synchronized.

2. Data Synchronization Schemes

2.1 Synchronous Dual‑Write

This simplest approach writes data to both MySQL and Elasticsearch simultaneously.

Advantages: simple business logic, high real‑time performance.

Disadvantages: hard‑coded, strong coupling, risk of data loss on dual‑write failure, performance degradation.

2.2 Asynchronous Dual‑Write

Uses a message queue (MQ) to achieve asynchronous multi‑source writes.

Advantages: high performance, reliable data loss protection via MQ replay, easy to extend with additional sources.

Disadvantages: hard‑coding for new sources, increased system complexity, potential latency due to asynchronous consumption.

2.3 Timer‑Based SQL Extraction

Adds a timestamp column to relevant tables, leaves existing CRUD unchanged, and runs a periodic timer to extract changed rows and write them to Elasticsearch.

Advantages: non‑intrusive, no business coupling, simple worker code.

Disadvantages: poorer timeliness, added polling load on the database.

Classic solution: use Logstash to periodically query new data via SQL and write it to Elasticsearch for incremental sync.

2.4 Real‑Time Binlog Synchronization

Leverages MySQL binlog to achieve real‑time, non‑intrusive synchronization.

Read MySQL binlog to obtain table change logs.

Convert logs to MQ messages.

Develop an MQ consumer.

Consume messages and write them to Elasticsearch.

Advantages: no code intrusion, no hard‑coding, no changes to the original system, high performance, decoupled architecture.

Disadvantages: complex binlog system setup, potential MQ latency if used.

3. Data Migration Tool Selection

The binlog‑based real‑time sync is the most common today, spawning many migration tools that implement Change Data Capture (CDC) by simulating a MySQL slave.

3.1 Canal

Canal pretends to be a MySQL slave, subscribes to master binlog, parses it to JSON, and forwards data to Elasticsearch via TCP or MQ.

3.2 Alibaba DTS

Data Transmission Service supports multiple data sources, offers migration, real‑time subscription, and sync with high performance, high availability, and a visual UI (paid service).

3.3 Databus

LinkedIn’s open‑source low‑latency, reliable CDC system supporting MySQL and Oracle, offering high availability, transaction ordering, and unlimited back‑tracking.

3.4 Other Tools

Flink – distributed stream processing engine.

CloudCanal – commercial data sync product.

Maxwell – outputs data changes as JSON without client code.

DRDS – Alibaba’s distributed database middleware.

Yugong – migrates data from Oracle to MySQL.

4. Conclusion

This article provides an overview of MySQL‑to‑Elasticsearch synchronization strategies and common migration tools, helping readers choose the most suitable solution for their projects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Elasticsearch MySQL Binlog Data synchronization CDC Data Migration Tools

Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.