How PhxSQL Achieves Strong Consistency and High Availability for MySQL
This article explains the design and implementation of PhxSQL, a MySQL‑compatible high‑availability solution that uses a reliable log storage based on Paxos, Proxy request forwarding, automatic master election, and other mechanisms to overcome native MySQL replication flaws and provide strong data consistency and fault‑tolerant performance.
Design Background
Internet applications, especially account and financial systems, require strong consistency and high availability. Traditional MySQL master‑slave setups cannot guarantee both when machines fail, networks partition, or manual/automatic failover occurs. PhxSQL builds a MySQL cluster on top of a robust Paxos‑based log, ensuring data consistency across MySQL instances and overall cluster high availability.
Native MySQL Disaster‑Recovery Defects
MySQL Replication Schemes
MySQL provides asynchronous and semi‑synchronous replication. In asynchronous mode, the master commits locally and replicates to the slave later, which may lead to data loss if the replication fails (see Figure 1). Semi‑synchronous replication waits for the slave to acknowledge before committing, improving consistency (see Figure 2), but still has shortcomings during master restarts and failovers.
Master Restart Issues
When a master restarts, pending binlog entries (written to the binlog file but not yet replicated) may be committed directly, causing divergence between old and new masters (Figure 3). This can produce data inconsistency, phantom reads for clients, and split‑brain scenarios (Figures 4‑6). MySQL also lacks an automatic master election mechanism (Figure 7).
PhxSQL Design Idea
Reliable Log Storage
PhxSQL introduces a reliable log storage cluster (BinlogSvr) based on Paxos. The master sends its binlog to BinlogSvr; slaves pull binlog from BinlogSvr for replication. During master restart, BinlogSvr is consulted to decide whether a pending binlog should be kept or discarded, guaranteeing consistency (Figure 8).
Request Forwarding
A proxy layer (PhxSQLProxy) sits between clients and MySQL. It forwards client requests to the current master, preventing client split‑brain during master switches. Two forwarding modes are supported: read/write port forwarding and read‑only port forwarding (Figure 12). The proxy uses a coroutine model (Libco) for high performance and maintains a 1:1 connection model to preserve MySQL transaction semantics (Figure 13). It also forwards the real client IP via a reserved MySQL protocol field to keep permission checks correct (Figure 14).
Automatic Master Election
Each node runs an Agent that monitors MySQL health. Healthy masters periodically renew a lease in the reliable store; non‑masters check the lease and, if expired, initiate a Paxos‑based election to become the new master (Figure 10).
PhxSQL Architecture and Implementation
Each node hosts three components: PhxSQLProxy, MySQL, and PhxBinlogSvr. All PhxBinlogSvr instances form a reliable log and master‑info store, also acting as the Agent. PhxSync, analogous to MySQL’s semi‑sync plugin, commits binlog entries to BinlogSvr and calibrates binlog state on restart (Figure 9).
PhxBinlogSvr
BinlogSvr stores binlog data and master information, achieving consensus via the open‑source PhxPaxos library. It supports MySQL’s native replication protocol, rejects writes from non‑master nodes, and uses optimistic locking to prevent erroneous master submissions (Figures 15‑16). It also provides automatic master election through Paxos (Figure 17).
PhxSQL Effects
Data Consistency
Comparisons of binlog, Paxos state, and BinlogSvr data across three nodes show full consistency (Figure 18).
Master Automatic Switch
During a master failure, traffic shifts smoothly to the new master, confirming successful failover (Figure 19).
Performance
Benchmarks using sysbench on Percona 5.6.31‑77.0 demonstrate that PhxSQL’s write performance exceeds MySQL semi‑sync, while read performance is slightly lower due to the proxy layer. Overall, PhxSQL delivers strong consistency, high availability, and competitive performance (Figure 20).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
WeChat Backend Team
Official account of the WeChat backend development team, sharing their experience in large-scale distributed system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
