How UDB Supercharges MySQL Replication with Deep Kernel Optimizations
This article details UDB's high‑availability architecture and four kernel‑level optimizations—binlog replication, relay‑log recording, master.info handling, and relay‑log locking—that together improve MySQL semi‑synchronous replication performance and reliability.
UDB High‑Availability Architecture
UDB provides a dual‑node HA setup built on MySQL Community Server 5.7.16. A virtual IP is served by two HAProxy instances; each HAProxy connects to a single‑node MySQL instance. Only one HAProxy is active at a time, so write conflicts are avoided. If the active HAProxy fails, the virtual IP floats to the standby instance without client reconfiguration, guaranteeing continuous availability.
Replication Workflow
After a SQL statement succeeds, the MySQL server writes a binlog entry.
The Dump thread reads the binlog and streams it to the replica’s I/O thread.
The I/O thread appends the received binlog to the relay log and updates master.info with the receive position.
The SQL thread reads the relay log, replays the events, and records its progress in relay‑log.info.
1) Binlog Replication Optimization (dual‑channel model)
Problem: Native semi‑synchronous replication can fall back to asynchronous mode when network jitter or replica lag occurs, losing the ability to detect consistency and making failover unsafe.
Solution: Add a second replication channel that runs in parallel with the original semi‑sync channel. The new channel transmits only the master’s execution progress (binlog file name and position) using the semi‑sync protocol, never carrying actual data. Because it never buffers data, it cannot fall behind; on timeout it reconnects automatically, ensuring the replica always knows the master’s latest position even if the primary channel pauses.
2) Relay‑Log File Recording Optimization
Problem: In ROW‑based binlog with GTID, a single DML generates multiple events (GTID_EVENT, QUERY_EVENT, TABLE_MAP_EVENT, WRITE_ROW_EVENT, XID_EVENT). The I/O thread writes each event individually to the relay log, causing many small I/O operations and poor file‑cache utilization.
Solution: Modify the I/O thread to batch writes at the transaction level. All events belonging to one transaction are merged into a single larger write to the relay log, reducing I/O overhead and improving throughput.
3) Master.info File Recording Optimization
Problem: When GTID‑based replication (auto_position=1) is used, the binlog file name and position stored in master.info are no longer required for replication, yet the file is updated after every relay‑log write.
Solution: Before updating master.info, check whether GTID replication is enabled. If so, skip writing the binlog file name and position fields while preserving other metadata needed for operations such as CHANGE MASTER and server shutdown.
4) Relay‑Log Lock Optimization
Problem: The I/O thread and SQL thread share the same file cache for the relay log. Concurrent reads/writes cause lock contention, degrading performance.
Solution: Separate the caches for the I/O and SQL threads, eliminating the shared lock. Although the SQL thread incurs an extra read I/O, the overall replication throughput improves significantly.
Resulting Replication Flow
The combined optimizations reduce replication latency, prevent degradation to asynchronous mode, and guarantee that the replica always knows its synchronization state relative to the master.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
