Using Orchestrator for Automatic MySQL Cluster Failover: Configuration and Test Cases
This article demonstrates how to configure the open-source Orchestrator tool for automatic MySQL cluster failover, explains key parameters, and presents three test cases covering normal failover, lag‑induced prevention, and the effect of disabling global recoveries.
Parameter Description:
Reference: https://github.com/openark/orchestrator/blob/master/go/config/config.go
Purpose:
Use Orchestrator to configure automatic failover for a MySQL cluster.
Managed database instances (1 master 1 slave architecture):
10.186.65.5:3307
10.186.65.11:3307Orchestrator related parameters:
"RecoveryIgnoreHostnameFilters": [],
"RecoverMasterClusterFilters": ["*"],
"RecoverIntermediateMasterClusterFilters": ["*"],
"ReplicationLagQuery": "show slave status",
"ApplyMySQLPromotionAfterMasterFailover": true,
"FailMasterPromotionOnLagMinutes": 1,Test scenarios are executed on the raft-leader node.
Case 1:
Scenario:
Shut down the master and verify failover when the replication lag is less than FailMasterPromotionOnLagMinutes.
Operation:
# Confirm existing clusters
orchestrator-client -c clusters
# View topology, cluster is 10.186.65.11:3307
orchestrator-client -c topology -i 10.186.65.11:3307
# Stop master node
ssh [email protected] "service mysqld_3307 stop"
# Confirm clusters again; original cluster splits into two
orchestrator-client -c clusters
# View topology, now cluster is 10.186.65.5:3307
orchestrator-client -c topology -i 10.186.65.5:3307Conclusion:
Failover succeeded.
The new master has read_only and super_read_only disabled, allowing read‑write operations.
Case 2:
Scenario:
Shut down the master and verify failover when the replication lag exceeds FailMasterPromotionOnLagMinutes (configured as 1 minute).
Operation:
# Show current FailMasterPromotionOnLagMinutes value
orchestrator -c dump-config --ignore-raft-setup | jq .FailMasterPromotionOnLagMinutes
# Confirm existing clusters
orchestrator-client -c clusters
# View topology, cluster is 10.186.65.11:3307
orchestrator-client -c topology -i 10.186.65.11:3307
# Create a delayed slave (e.g., 120 s)
stop slave ;
change master to master_delay=120;
start slave ;
# or
orchestrator-client -c delay-replication -i 10.186.65.5:3307 -S 120
# Wait 120 s
sleep 120
# View topology, cluster remains 10.186.65.11:3307
orchestrator-client -c topology -i 10.186.65.11:3307
# Stop master node
ssh [email protected] "service mysqld_3307 stop"
# Confirm clusters and topology again
orchestrator-client -c clusters
orchestrator-client -c topology -i 10.186.65.11:3307Conclusion:
No failover occurred.
When the slave lag exceeds FailMasterPromotionOnLagMinutes, failover is prevented.
Case 3:
Scenario:
Disable global recovery and shut down the master while lag is less than FailMasterPromotionOnLagMinutes.
Operation:
# Disable global recoveries
orchestrator-client -c disable-global-recoveries
orchestrator-client -c check-global-recoveries
# Confirm clusters and topology
orchestrator-client -c clusters
orchestrator-client -c topology -i 10.186.65.11:3307
# Stop master node
ssh [email protected] "service mysqld_3307 stop"
# Confirm clusters and topology again
orchestrator-client -c clusters
orchestrator-client -c topology -i 10.186.65.11:3307Conclusion:
No failover occurred.
Disabling global recoveries prevents automatic failover.
Summary:
After configuring Orchestrator, automatic failover can be controlled via parameters such as RecoveryIgnoreHostnameFilters, RecoverMasterClusterFilters, RecoverIntermediateMasterClusterFilters, as well as conditions like FailMasterPromotionOnLagMinutes and ReplicationLagQuery. When lag exceeds the configured minutes or global recovery is disabled, failover does not occur.
Testing scenarios are limited; further tests may be needed for specific cases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
