How We Achieve Real‑Time MySQL‑to‑Elasticsearch Sync with Binlog and Kafka
This article explains how a large e‑commerce platform replaced a MySQL‑centric intermediate table with a binlog‑driven pipeline that streams changes through Kafka into Elasticsearch, ensuring ordered, complete, and low‑latency data synchronization while addressing schema evolution and operational monitoring.
Part.1 Existing Solution and Problems
Initially the team used a MySQL intermediate table to collect business data for Elasticsearch indexing. Incremental rows were identified by a timestamp column and written to ES via a cron script. This approach required dual writes, caused column‑addition latency as data grew, and increased development overhead for mapping changes.
Part.2 Binlog‑Based Data Synchronization
To meet stricter real‑time requirements, the team adopted MySQL binlog as the source of change events. Binlog records are published to Kafka, where authorized consumer groups read them in order. By hashing the primary key to Kafka partitions, each record’s events stay ordered, and a single consumer per partition guarantees sequential updates to Elasticsearch.
Data completeness is ensured by committing Kafka offsets only after the corresponding bulk request succeeds in Elasticsearch, preventing loss during consumer rebalance or crashes.
Part.3 Technical Implementation
The architecture consists of five key modules:
Configuration Parsing : Reads TOML/JSON files or configuration‑center entries to obtain Kafka, Elasticsearch, and MySQL‑to‑ES mapping settings.
Rule Engine : Determines target index, document ID, field mappings, and optional where‑clauses for filtering.
Kafka Connector : A Golang Kafka consumer handling SASL authentication and offset management.
Binlog Parsing : Consumes JSON‑encoded binlog events (produced by Canal) and converts them into key‑value maps.
Elasticsearch Writer : Buffers maps and issues bulk _bulk requests every 200 ms or when a size threshold is reached.
Field‑type conversion examples include:
tinyint‑bigint → int64 float‑double → float32 char/varchar/text → string datetime/timestamp → string (Y-m-d H:i:s) date → string (Y-m-d) Customizations added:
Upsert support : Include {"doc_as_upsert": true} in bulk payloads to handle out‑of‑order updates.
Filtering : Rule engine can filter events, e.g., select * from sometable where type in (1,2).
Fast incremental catch‑up : Pre‑commit target offsets for relevant partitions before starting consumption.
Microservice & configuration center integration : Deployable as a microservice with centralized configuration management.
Part.4 Logging and Monitoring
Each binlog event, Kafka message, bulk request payload, and response are logged to an ELK stack. Two key metrics are monitored: synchronization latency (average ~1 s for order data) and heartbeat health checks that verify the end‑to‑end pipeline.
Part.5 Conclusion
The binlog‑driven pipeline now powers the e‑commerce order index with stable sub‑second latency, offering a reusable pattern for any scenario requiring reliable MySQL‑to‑Elasticsearch synchronization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mafengwo Technology
External communication platform of the Mafengwo Technology team, regularly sharing articles on advanced tech practices, tech exchange events, and recruitment.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
