Databases 16 min read

Design and Implementation of MySQL High Availability Using Orchestrator and DBProxy

This article presents a comprehensive design and implementation for achieving MySQL high availability by replacing the single‑master architecture with Orchestrator‑driven automatic failover, integrating DBProxy for transparent routing, and addressing topology changes and data compensation to ensure continuous, reliable service.

Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
Design and Implementation of MySQL High Availability Using Orchestrator and DBProxy

The current system accesses a MySQL cluster through SFNS and a DBProxy layer, with a single master and multiple slaves; reads go to slaves, writes to the master, creating a single‑point‑of‑failure for writes.

To overcome this, the team evaluated high‑availability solutions, comparing the widely used MHA with Orchestrator, and chose Orchestrator because its Raft‑based management nodes avoid a management‑node single point of failure and provide superior failover detection and promotion logic.

Orchestrator offers several advanced features: automatic discovery of MySQL replication topology, support for topology modifications, command‑line, HTTP API, and web UI management, and it is written in Go for easy extensibility.

Key supporting component DBProxy provides read/write splitting, load balancing, security authentication, automatic reconnection, connection pooling, and hot‑config reload, simplifying application code and improving reliability.

Orchestrator monitors MySQL instances every five seconds, stores their status, and detects failures by confirming both master unreachability and the inability to reach its slaves before declaring a master failure.

When a failure is detected, Orchestrator sorts candidate replicas based on Binlog execution position, data‑center proximity, and user‑defined promotion rules (must, prefer, neutral, prefer_not, must_not), then selects a new master considering version, Binlog format, and promotion restrictions.

After selecting a new master, replicas are classified as ahead, equal, or later relative to the new master’s Binlog position; topology reconstruction resets replication relationships for equal and later replicas and isolates unavailable or ahead replicas.

A second promotion step may replace the initially chosen master with a more ideal one based on promotion rules, data‑center, and physical environment preferences, ensuring the new master aligns with operational policies.

Once the new topology is ready, the new master is set to read‑only, the old master is forced read‑only, and the new master’s write capability is restored after data compensation, preventing dirty writes.

The article then discusses practical challenges: ensuring DBProxy topology updates after Orchestrator failover, avoiding simultaneous writes to old and new masters, and implementing data compensation for missing transactions caused by replication lag or post‑failover write conflicts.

Design solutions include setting the new master to read‑only before DBProxy changes, performing DBProxy configuration updates via DBProxyAdmin (upload, modify, distribute, and restart), and executing data‑compensation by retrieving differential binlogs from the failed master and applying them to the new master.

Hooks are integrated into Orchestrator’s PostFailoverProcesses to trigger DBProxy modifications and data‑compensation logic, with custom parameters passed to the Go binary handling these tasks.

Deployment of Orchestrator in production achieved full coverage of MySQL clusters, multiple successful failover drills, average switch time of 5 seconds (max 20 seconds), and significantly improved availability and reliability.

High AvailabilityMySQLdatabase replicationFailoverData CompensationDBProxyOrchestrator
Beijing SF i-TECH City Technology Team
Written by

Beijing SF i-TECH City Technology Team

Official tech channel of Beijing SF i-TECH City. A publishing platform for technology innovation, practical implementation, and frontier tech exploration.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.