Databases 6 min read

Using Orchestrator for Automatic MySQL Cluster Failover: Configuration and Test Cases

This article demonstrates how to configure the open-source Orchestrator tool for automatic MySQL cluster failover, explains key parameters, and presents three test cases covering normal failover, lag‑induced prevention, and the effect of disabling global recoveries.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Using Orchestrator for Automatic MySQL Cluster Failover: Configuration and Test Cases

Parameter Description:

Reference: https://github.com/openark/orchestrator/blob/master/go/config/config.go

Purpose:

Use Orchestrator to configure automatic failover for a MySQL cluster.

Managed database instances (1 master 1 slave architecture):

10.186.65.5:3307
10.186.65.11:3307

Orchestrator related parameters:

"RecoveryIgnoreHostnameFilters": [],
"RecoverMasterClusterFilters": ["*"],
"RecoverIntermediateMasterClusterFilters": ["*"],
"ReplicationLagQuery": "show slave status",
"ApplyMySQLPromotionAfterMasterFailover": true,
"FailMasterPromotionOnLagMinutes": 1,

Test scenarios are executed on the raft-leader node.

Case 1:

Scenario:

Shut down the master and verify failover when the replication lag is less than FailMasterPromotionOnLagMinutes .

Operation:

# Confirm existing clusters
  orchestrator-client -c clusters
# View topology, cluster is 10.186.65.11:3307
  orchestrator-client -c topology -i 10.186.65.11:3307
# Stop master node
  ssh [email protected] "service mysqld_3307 stop"
# Confirm clusters again; original cluster splits into two
  orchestrator-client -c clusters
# View topology, now cluster is 10.186.65.5:3307
  orchestrator-client -c topology -i 10.186.65.5:3307

Conclusion:

Failover succeeded.

The new master has read_only and super_read_only disabled, allowing read‑write operations.

Case 2:

Scenario:

Shut down the master and verify failover when the replication lag exceeds FailMasterPromotionOnLagMinutes (configured as 1 minute).

Operation:

# Show current FailMasterPromotionOnLagMinutes value
  orchestrator -c dump-config --ignore-raft-setup | jq .FailMasterPromotionOnLagMinutes
# Confirm existing clusters
  orchestrator-client -c clusters
# View topology, cluster is 10.186.65.11:3307
  orchestrator-client -c topology -i 10.186.65.11:3307
# Create a delayed slave (e.g., 120 s)
  stop slave ;
  change master to master_delay=120;
  start slave ;
# or
  orchestrator-client -c delay-replication -i 10.186.65.5:3307 -S 120
# Wait 120 s
  sleep 120
# View topology, cluster remains 10.186.65.11:3307
  orchestrator-client -c topology -i 10.186.65.11:3307
# Stop master node
  ssh [email protected] "service mysqld_3307 stop"
# Confirm clusters and topology again
  orchestrator-client -c clusters
  orchestrator-client -c topology -i 10.186.65.11:3307

Conclusion:

No failover occurred.

When the slave lag exceeds FailMasterPromotionOnLagMinutes , failover is prevented.

Case 3:

Scenario:

Disable global recovery and shut down the master while lag is less than FailMasterPromotionOnLagMinutes .

Operation:

# Disable global recoveries
  orchestrator-client -c disable-global-recoveries
  orchestrator-client -c check-global-recoveries
# Confirm clusters and topology
  orchestrator-client -c clusters
  orchestrator-client -c topology -i 10.186.65.11:3307
# Stop master node
  ssh [email protected] "service mysqld_3307 stop"
# Confirm clusters and topology again
  orchestrator-client -c clusters
  orchestrator-client -c topology -i 10.186.65.11:3307

Conclusion:

No failover occurred.

Disabling global recoveries prevents automatic failover.

Summary:

After configuring Orchestrator, automatic failover can be controlled via parameters such as RecoveryIgnoreHostnameFilters , RecoverMasterClusterFilters , RecoverIntermediateMasterClusterFilters , as well as conditions like FailMasterPromotionOnLagMinutes and ReplicationLagQuery . When lag exceeds the configured minutes or global recovery is disabled, failover does not occur.

Testing scenarios are limited; further tests may be needed for specific cases.

High AvailabilityMySQLCluster ManagementDatabase OperationsFailoverOrchestrator
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.