Four Ways to Sync MySQL Data to Elasticsearch – Pros, Cons, and Tools

This article compares four common approaches for synchronizing MySQL data to Elasticsearch—synchronous dual write, asynchronous dual write via message queues, scheduled tasks, and binlog‑based data subscription—detailing their advantages, drawbacks, implementation steps, and tool choices such as Canal, Maxwell, and Python‑MySQL‑Replication.

ITPUB
ITPUB
ITPUB
Four Ways to Sync MySQL Data to Elasticsearch – Pros, Cons, and Tools

1. Synchronous Dual Write

The most straightforward method writes to MySQL and simultaneously writes the same record to Elasticsearch.

Pros: Simple to implement.

Cons: Business coupling (synchronization code embedded in product logic), performance impact (two writes increase latency), limited extensibility for complex search requirements.

2. Asynchronous Dual Write

Data is first placed into a message queue (MQ) when a product is created; a dedicated search service subscribes to these messages and writes to Elasticsearch, decoupling the product service.

Aggregating data into a wide table before indexing improves query efficiency, but when aggregation is needed across multiple relational tables, the service may still need to query the database (a “back‑lookup”).

Pros: Decouples services, near‑real‑time synchronization (seconds) under normal conditions.

Cons: Introduces additional components and complexity.

3. Scheduled Tasks

A periodic job copies data from MySQL to Elasticsearch. Frequency selection is critical: high frequency can cause resource spikes, low frequency reduces freshness.

Pros: Easy to implement.

Cons: Real‑time guarantees are weak; can impose heavy load on storage.

4. Data Subscription (Binlog)

MySQL binlog can be subscribed to for change data capture. Frameworks like canal act as a pseudo‑replica, exposing adapters (including an Elasticsearch adapter) that push changes to ES with little custom code.

When complex aggregation or back‑lookup is required, a custom Canal client may be needed to listen, aggregate, and write to ES.

Pros: Minimal intrusion into business code, better real‑time characteristics.

Cons: May still require custom client development for aggregation.

Popular Binlog Subscription Frameworks

Canal – Open‑source by Alibaba, Java‑based, supports Kafka/RocketMQ, multiple client languages.

Maxwell – Open‑source by Zendesk, Java‑based, outputs JSON, supports Kafka/RabbitMQ/Redis.

Python‑MySQL‑Replication – Community project, Python‑based, customizable message format.

These frameworks share the same underlying principle and can be adapted to other target stores such as HBase.

References:

https://www.infoq.cn/article/1afyz3b6hnhprrg12833

https://www.iamle.com/archives/2900.html

https://blog.51cto.com/lianghecai/4755693

https://qinyuanpei.github.io/posts/1333693167/

https://github.com/alibaba/canal/wiki/ClientAdapter

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ElasticsearchmysqlMessage QueueCanaldata-sync
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.