Four Ways to Sync MySQL Data to Elasticsearch – Pros, Cons, and Tools
This article compares four common approaches for synchronizing MySQL data to Elasticsearch—synchronous dual write, asynchronous dual write via message queues, scheduled tasks, and binlog‑based data subscription—detailing their advantages, drawbacks, implementation steps, and tool choices such as Canal, Maxwell, and Python‑MySQL‑Replication.
1. Synchronous Dual Write
The most straightforward method writes to MySQL and simultaneously writes the same record to Elasticsearch.
Pros: Simple to implement.
Cons: Business coupling (synchronization code embedded in product logic), performance impact (two writes increase latency), limited extensibility for complex search requirements.
2. Asynchronous Dual Write
Data is first placed into a message queue (MQ) when a product is created; a dedicated search service subscribes to these messages and writes to Elasticsearch, decoupling the product service.
Aggregating data into a wide table before indexing improves query efficiency, but when aggregation is needed across multiple relational tables, the service may still need to query the database (a “back‑lookup”).
Pros: Decouples services, near‑real‑time synchronization (seconds) under normal conditions.
Cons: Introduces additional components and complexity.
3. Scheduled Tasks
A periodic job copies data from MySQL to Elasticsearch. Frequency selection is critical: high frequency can cause resource spikes, low frequency reduces freshness.
Pros: Easy to implement.
Cons: Real‑time guarantees are weak; can impose heavy load on storage.
4. Data Subscription (Binlog)
MySQL binlog can be subscribed to for change data capture. Frameworks like canal act as a pseudo‑replica, exposing adapters (including an Elasticsearch adapter) that push changes to ES with little custom code.
When complex aggregation or back‑lookup is required, a custom Canal client may be needed to listen, aggregate, and write to ES.
Pros: Minimal intrusion into business code, better real‑time characteristics.
Cons: May still require custom client development for aggregation.
Popular Binlog Subscription Frameworks
Canal – Open‑source by Alibaba, Java‑based, supports Kafka/RocketMQ, multiple client languages.
Maxwell – Open‑source by Zendesk, Java‑based, outputs JSON, supports Kafka/RabbitMQ/Redis.
Python‑MySQL‑Replication – Community project, Python‑based, customizable message format.
These frameworks share the same underlying principle and can be adapted to other target stores such as HBase.
References:
https://www.infoq.cn/article/1afyz3b6hnhprrg12833
https://www.iamle.com/archives/2900.html
https://blog.51cto.com/lianghecai/4755693
https://qinyuanpei.github.io/posts/1333693167/
https://github.com/alibaba/canal/wiki/ClientAdapter
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
