How UDB Achieves High Availability: Deep MySQL Replication Optimizations
This article explains UDB's high‑availability architecture, detailing its dual‑node design with virtual IP and HAProxy, and describes the kernel‑level optimizations applied to MySQL's native semi‑synchronous replication, relay log handling, master.info management, and lock contention to boost stability and performance.
UDB High Availability Database Architecture
UDB provides a dual‑node high‑availability setup built on a virtual IP, HAProxy, and single‑node UDB instances.
Both nodes store full data copies, ensuring redundancy and availability.
HAProxy connects to only one UDB node at a time, avoiding write conflicts.
Dual HAProxy instances guarantee proxy availability.
When HAProxy fails, the virtual IP drifts to the standby instance, requiring no IP changes from users.
The consistency of the replica UDB data with the primary is critical; therefore, semi‑synchronous replication is heavily optimized at the kernel level.
UDB Database Deep Optimizations
UDB is based on MySQL Community Server 5.7.16 and adds kernel‑level enhancements for high availability.
The replication workflow includes:
MySQL Server records a binlog after successful SQL execution.
A Dump thread reads the binlog and sends it to the replica's I/O thread.
The I/O thread writes the received binlog to a relay log and updates master.info with its progress.
The SQL thread replays the relay log and records its progress in relay‑log.info.
UDB improves several steps of this native process to increase stability.
1. Relay Log File Recording Optimization
Problem: In MySQL, binlog events are recorded individually, and the I/O thread writes each event to the relay log, which is inefficient and does not fully use MySQL's file cache.
Solution: Change the I/O thread to record relay logs by transaction instead of by event, merging small I/O operations to boost I/O performance.
2. Master.info File Recording Optimization
Problem: master.info stores the primary’s IP, port, and binlog position, which becomes irrelevant when GTID‑based replication is used.
Solution: After the I/O thread successfully records the relay log, add a check: if GTID is enabled and used (auto_position=1), skip updating the binlog file and position in master.info. Other operations on master.info remain unchanged.
3. Relay Log Lock Optimization
Problem: The I/O and SQL threads share the same file cache for the relay log, causing lock contention and performance degradation.
Solution: Separate the I/O and SQL thread operations on the relay log, eliminating the shared cache and lock competition, which significantly improves overall performance despite an extra read I/O for the SQL thread.
Conclusion
The optimized replication flow diagram (shown below) illustrates the improvements. By refining binlog handling, relay‑log recording, master.info updates, and lock management, UDB’s high‑availability database achieves markedly better reliability and efficiency.
UCloud Tech
UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
