How Xiaomi Achieved Real‑Time MySQL‑to‑Kudu Sync with Binlog and Talos
Facing MySQL performance bottlenecks at massive scale, Xiaomi built the LCSBinlog service that captures MySQL binlog events, streams them through the Talos platform, and writes to Kudu for real‑time BI, detailing architecture, job scheduling, consistency guarantees, use cases, and troubleshooting lessons.
Background
When MySQL tables grow to tens of millions or billions of rows, query latency becomes unacceptable for real‑time BI. Traditional batch sync with Sqoop is coarse‑grained, adds load to the source, and Hive does not handle updates well.
Binlog‑Based Change Data Capture
MySQL binary log (binlog) records every DDL/DML operation with timestamps. Replication works as follows:
The master writes changes to the binlog before committing a transaction.
A slave connects to the master, requests the binlog from a given position.
The master spawns a dump thread that streams the binlog.
The slave’s I/O thread reads the stream and writes it to a relay log.
A SQL thread replays events from the relay log to apply the changes.
The slave also records its own binlog.
By masquerading as a MySQL slave, a custom service can consume these events in real time.
LCSBinlog Service Architecture
LCSBinlog is a CDC service built on the Talos streaming platform (a self‑developed message queue). The data flow is: MySQL → Talos (topic) → Kudu → SparkSQL (ad‑hoc BI) Talos provides partitioned ordering and stream processing; Kudu stores the data for low‑latency analytical queries.
Service Roles
Master : schedules jobs, maintains high‑availability via Zookeeper EPHEMERAL nodes.
Worker : executes data‑sync tasks.
Job Types
BinlogSyncJob : for each MySQL instance, reads the full binlog and writes events to a dedicated Talos topic.
MysqlSyncJob : consumes the binlog, filters configured databases/tables, and streams the filtered events to user‑defined topics.
Zookeeper coordinates service state and records job progress; progress offsets are persisted in Kudu tables.
Data Correctness Guarantees
Ordering
Each BinlogSource is bound to a Talos topic. The service hashes (database, table) identifiers to a specific partition, ensuring that all events for a given table are processed in the same partition and thus in order.
Consistency
The service records the last committed offset for each job; after a restart it resumes from that offset.
Binlog ordering guarantees that re‑consumed events are applied in the original sequence.
Downstream stores (Kudu, Elasticsearch, Redis) use primary‑key upserts, making duplicate replay idempotent and eventually consistent.
Typical Application Scenarios
Real‑time cache refresh : binlog events are streamed to Redis, keeping caches up‑to‑date and reducing MySQL read load.
Asynchronous processing & system decoupling : downstream systems consume events after the MySQL transaction commits, avoiding latency impact on the source transaction.
Ad‑hoc BI queries : synchronized data lands in Kudu, where SparkSQL provides fast analytical queries without affecting the production MySQL.
Over 3,000 jobs run stably in production, supporting operational and analytical reporting.
Using Binlog Data
MySQL must enable binlog in ROW format to capture before‑and‑after column values. LCSBinlog reads these events, converts them into a unified JSON‑like structure, and writes them to Talos topics. Downstream SparkStreaming (and future Flink) jobs consume the topics and persist data to Kudu.
Troubleshooting
1. Duplicate rows caused by primary‑key updates
When a unique index (including the primary key) is updated, Kudu would treat the CDC UPDATE as an INSERT, creating duplicate rows. LCSBinlog resolves this by converting such an UPDATE into a DELETE followed by an INSERT.
2. Full‑dump timeout and OOM
Using the MySQL JDBC driver with default settings loads the entire result set into client memory. To stream rows safely, set the statement properties as follows:
Connection conn = DriverManager.getConnection(url, user, password);
PreparedStatement stmt = conn.prepareStatement(sql);
stmt.setFetchSize(Integer.MIN_VALUE); // enables streaming mode
stmt.setFetchDirection(ResultSet.FETCH_FORWARD);
ResultSet rs = stmt.executeQuery();
// Process rows one by oneThis forces row‑by‑row retrieval, prevents OOM, and reduces load on the MySQL source.
Conclusion
By leveraging MySQL binlog as a CDC source and the Talos streaming platform, LCSBinlog provides reliable, ordered, and eventually consistent real‑time replication from MySQL to downstream analytical stores such as Kudu, Elasticsearch, and Redis. The architecture, job model, ordering/consistency mechanisms, and practical lessons (handling unique‑index updates and streaming full dumps) can guide the design of similar CDC pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
