How Meituan Dianping Built a Reliable MySQL Group Replication HA Architecture
This article details Meituan Dianping's practical experience deploying MySQL Group Replication (MGR) for CMDB high availability, covering background, MGR fundamentals, configuration limits, parameter tuning, architecture design, deployment timeline, typical issues, a custom Python client, and daily operational practices.
Background
MySQL Group Replication (MGR) became generally available in MySQL 5.7.17, offering built‑in high‑availability without external components. Meituan Dianping needed a self‑contained HA solution for its CMDB, which could not rely on the existing MHA‑based architecture.
About MGR
MGR is implemented as a MySQL plugin that handles conflict detection and Paxos‑based communication. It synchronises nodes via the binary log (ROW format) and therefore feels familiar to DBAs accustomed to traditional master‑slave setups. Although it resembles Percona XtraDB Cluster, MGR still relies on binlog replication.
Solution Design
The team selected the single‑primary mode of MGR to avoid known multi‑primary issues. In this mode, when the primary fails the cluster automatically elects a new primary and applications redirect writes accordingly.
Key Considerations
MGR limitations (InnoDB‑only, mandatory primary key, ROW binlog format, no SAVEPOINT, etc.)
Required pre‑deployment checks: eliminate MyISAM tables, add missing primary keys, and remove SAVEPOINT usage.
Operational concerns such as network jitter, backup tool compatibility, and Online DDL support.
Parameter Tuning
Important parameters were adjusted based on testing: group_replication_unreachable_majority_timeout: default 0 (no timeout) is unsafe; a finite timeout prevents nodes from staying UNREACHABLE indefinitely. group_replication_compression_threshold: lowering the threshold reduced large‑transaction‑induced UNREACHABLE states.
Final Architecture
A three‑data‑center, three‑node MGR cluster was deployed as the core HA layer, with an additional master‑slave cluster for disaster recovery. This hybrid design balances MGR’s novelty with proven fallback mechanisms.
Deployment Timeline
Since 2018, three critical systems (process, reporting, and CMDB) have been migrated to MGR.
Typical Problems and Resolutions
1. Large Transactions
During reporting system rollout, intermittent UNREACHABLE states were observed. Analysis linked the issue to large transactions exceeding the network transmission window. Reducing group_replication_compression_threshold and limiting transaction size eliminated the problem.
2. HANG During START/STOP
During a node‑down drill, Nginx hung because queries to performance_schema.replication_group_members blocked for ~10 seconds while MGR was stopping. Adding a timeout to these queries resolved the issue.
3. Data‑Center Failure
A bandwidth reduction in one data center triggered simultaneous MGR and master‑slave failovers. The root cause was DNS‑based connection logic that delayed address updates. The team migrated all CMDB access to an internal “Smart Client”.
Smart Client
Meituan Dianping developed an internal Python library built on MySQLdb that provides automatic primary election and read/write splitting for MGR, simplifying client‑side integration.
Daily Operations
Operational monitoring focuses on node state (any non‑ONLINE state is flagged) and a proxy for “lag” by tracking the number of pending transactions. GTID set differences between primary and replicas are also observed.
Conclusion
Extensive production testing shows that MGR can serve as a stable, reliable HA solution for DBAs, though it is less suited for write‑intensive workloads. The experience provides valuable insights for designing MySQL high‑availability architectures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
