How to Keep MySQL and Elasticsearch in Sync: 4 Practical Strategies
This article compares four common approaches for synchronizing product data from MySQL to Elasticsearch—synchronous dual write, asynchronous dual write via message queues, scheduled batch jobs, and binlog‑based data subscription—detailing their implementation steps, advantages, drawbacks, and practical considerations.
Synchronous Dual Write
The simplest approach writes a record to MySQL and, in the same transaction, inserts the same document into Elasticsearch.
Advantages
Implementation is straightforward; only a single code path is needed.
Disadvantages
Business logic becomes tightly coupled with synchronization code, making future changes risky.
Each request incurs two write operations, increasing latency and CPU usage.
Extending the model for complex search features (e.g., aggregating data from multiple tables) requires additional code and is difficult to maintain.
Asynchronous Dual Write with Message Queue
When a product is created, the service publishes a message to a message queue (e.g., Kafka, RabbitMQ). A dedicated search service consumes the messages and writes the corresponding documents to Elasticsearch, decoupling the product service from the search index.
For queries that need data from multiple relational tables (e.g., SPU, SKU), the search service may perform a fallback database lookup (回查) to enrich the result.
Advantages
Loose coupling: the product service does not contain synchronization logic.
Near‑real‑time propagation—typically a few seconds from publish to index.
Disadvantages
Additional infrastructure (message broker, consumer service) increases operational complexity.
Scheduled Batch Synchronization
A periodic job (cron, Airflow, etc.) reads changed rows from MySQL and pushes them to Elasticsearch. The job frequency determines the trade‑off between freshness and system load.
Advantages
Simple to implement; no additional runtime components are required.
Disadvantages
Stale data: real‑time guarantees are weak; the index may lag behind the source.
Batch runs can cause CPU, memory, and I/O spikes on both MySQL and Elasticsearch.
Data Subscription via MySQL Binlog (Canal + Adapter)
MySQL emits a binary log (binlog) for every data change. Tools such as Alibaba canal act as a pseudo‑replica, subscribing to the binlog and converting events into a stream.
Using canal‑adapter, an Elasticsearch adapter can be configured (via JSON/YAML) to map table rows directly to ES documents, achieving zero‑code synchronization for simple one‑to‑one mappings.
For complex scenarios—e.g., building a wide index that aggregates fields from SPU, SKU, and category tables—custom client logic is still required. The client must listen to binlog events, perform the necessary joins or lookups, and write the enriched document to Elasticsearch.
Advantages
Minimal intrusion into existing business code; synchronization runs independently.
Real‑time propagation: changes appear in ES within seconds of the commit.
Disadvantages
When data aggregation is needed, developers must write custom code to handle joins and fallback queries.
Reference URLs
https://www.infoq.cn/article/1afyz3b6hnhprrg12833
https://www.iamle.com/archives/2900.html
https://blog.51cto.com/lianghecai/4755693
https://qinyuanpei.github.io/posts/1333693167/
https://github.com/alibaba/canal/wiki/ClientAdapter
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
