Investigation of MySQL Group Replication Failover Triggered by System Time Anomaly and VM Suspension
The article analyzes a MySQL MGR cluster failover incident caused by a VM pause and system time jump, explains the underlying detection mechanisms, presents detailed source code excerpts, and demonstrates through debugging that time changes alone do not trigger MGR failover.
1. Problem Description
On September 15, an alert indicated that the MySQL MGR cluster in the test environment was abnormal.
Log excerpt:
// 10.x.y.97 node MySQL error log shows node 10.x.y.95 unreachable:
2022-09-15T19:26:14.320181+08:00 0 "Warning" "MY-011493" "Repl" Plugin group_replication reported: "Member with address 10.x.y.95:3306 has become unreachable."
// MGR probe logs show a switch:
"2022-09-15 19:26:16" : Exception: Invalid primary_member_ip: or secondary_node_list:"10.x.y.96", "10.x.y.97" .
"2022-09-15 19:26:16" : Exception: MGR is likely to be switching, Sleep 1 sec and continue .
"2022-09-15 19:27:01" : Exception: MGR running with WARN. ONLINE nodes Pri:10.x.y.96 Sec:10.x.y.97 diff from conf_node:10.x.y.95,10.x.y.96,10.x.y.97 .The alert indicates that the MySQL MGR cluster switched from a three‑node configuration to a two‑node configuration (Pri:10.x.y.96, Sec:10.x.y.97).
2. Preliminary Analysis
After receiving the alert, we examined MySQL error logs, OS logs, and monitoring data, discovering that the operating system time had changed unexpectedly ("Time has been changed" anomaly).
According to MySQL documentation, failure detection is time‑based:
If a member does not receive a message from another member within 5 seconds, it marks the member as UNREACHABLE in the replication_group_members table.
If the suspicion persists for more than 10 seconds, the member propagates the suspicion to the rest of the group. ( Reference )
We initially suspected that a sudden OS time jump triggered the MGR failover and reported the anomaly to the system administration team for further verification.
3. Root Cause
Deep analysis by system experts confirmed that the underlying cause was a VM pause around 19:27:13, during which the virtual machine could not perform any operation, including monitoring scripts, time updates, or MGR heartbeats.
When the VM resumed, the remaining two nodes could not communicate with the paused node, causing the paused node to be expelled and generating the alert.
4. Impact of Time on MGR
We asked whether a pure time jump (without a VM pause) would also trigger a failover.
Testing in the lab showed that when a node's clock was suddenly shifted forward or backward by one hour, the MGR cluster remained synchronized and no errors appeared in the error log; the time anomaly alone does not cause a failover.
5. Source Code Insight: Node Detection Functions (alive_task / detector_task)
The relevant functions and their call hierarchy are:
alive_task
task_now
may_be_dead
task_now
detector_task
check_global_node_set
DETECT
task_now
check_local_node_set
DETECT
task_now
// alive_task implementation (simplified)
int alive_task(task_arg arg) {
while (!xcom_shutdown) {
// broadcast alive if >0.5s elapsed
if (server_active(site, get_nodeno(site)) < task_now() - 0.5) {
replace_pax_msg(&ep->i_p, pax_msg_new(alive_synode, site));
ep->i_p->op = i_am_alive_op;
send_to_all_site(site, ep->i_p, "alive_task");
}
// if no heartbeat >4s, ask if node is alive
double sec = task_now();
if (i != get_nodeno(site) && may_be_dead(site->detected, i, sec)) {
replace_pax_msg(&ep->you_p, pax_msg_new(alive_synode, site));
ep->you_p->op = are_you_alive_op;
ep->you_p->a = new_app_data();
ep->you_p->a->app_key.group_id = ep->you_p->a->group_id = get_group_id(site);
ep->you_p->a->body.c_t = xcom_boot_type;
init_node_list(1, &site->nodes.node_list_val[i], &ep->you_p->a->body.app_u_u.nodes);
send_server_msg(site, i, ep->you_p);
}
TASK_DELAY(1.0);
}
}
#define DETECTOR_LIVE_TIMEOUT 5.0
#define DETECT(site, i) \
(i == get_nodeno(site)) || \
(site->detected[i] + DETECTOR_LIVE_TIMEOUT > task_now())
static void check_global_node_set(site_def *site, int *notify) {
u_int i;
u_int nodes = get_maxnodes(site);
site->global_node_count = 0;
for (i = 0; i < nodes && i < site->global_node_set.node_set_len; i++) {
int detect = DETECT(site, i);
if (site->global_node_set.node_set_val[i]) site->global_node_count++;
if (site->global_node_set.node_set_val[i] != detect) {
*notify = 1;
}
DBGOHK(FN; NDBG(i, u); NDBG(*notify, d));
}
}
static void check_local_node_set(site_def *site, int *notify) {
u_int i;
u_int nodes = get_maxnodes(site);
for (i = 0; i < nodes && i < site->global_node_set.node_set_len; i++) {
int detect = DETECT(site, i);
if (site->local_node_set.node_set_val[i] != detect) {
site->local_node_set.node_set_val[i] = detect;
*notify = 1;
}
DBGOHK(FN; NDBG(i, u); NDBG(*notify, d));
}
}
int detector_task(task_arg arg) {
while (!xcom_shutdown) {
site_def *x_site = get_executor_site_rw();
if (x_site && get_nodeno(x_site) != VOID_NODE_NO) {
if (x_site != last_x_site) {
reset_disjunct_servers(last_x_site, x_site);
}
update_detected(x_site);
if (x_site != last_x_site) {
last_x_site = x_site;
ep->notify = 1;
ep->local_notify = 1;
}
check_global_node_set(x_site, &ep->notify);
update_global_count(x_site);
if (ep->notify && iamtheleader(x_site) && enough_live_nodes(x_site)) {
ep->notify = 0;
send_my_view(x_site); // send view change to expel dead node
}
}
if (x_site && get_nodeno(x_site) != VOID_NODE_NO) {
update_global_count(x_site);
check_local_node_set(x_site, &ep->local_notify);
if (ep->local_notify) {
ep->local_notify = 0;
deliver_view_msg(x_site); // expel dead node
}
}
TIMED_TASK_WAIT(&detector_wait, 1.0);
}
}Key Points
Both alive_task and detector_task obtain the current time via task_now() , which uses a monotonic clock plus a fixed offset calculated at initialization. This means MGR heartbeats are based on a time source that is immune to OS time changes.
6. Debug Verification Records
Various debugger outputs show the task_timer structure and its fields ( real_start , monotonic_start , offset , now ), confirming that the offset remains constant while the monotonic counter advances regardless of system clock adjustments.
// Example of task_timer after time change:
1: task_timer = {real_start = 1664049812.9187768, monotonic_start = 50635.304078640998, offset = 1663999177.6146982, now = 1664055893.4010971, done = 1}After manually adjusting the system clock, the now value updates based on the unchanged offset, demonstrating that MGR continues to operate correctly.
Conclusion: The failover was caused by a VM pause, not by a time jump. MySQL MGR uses a monotonic time source, so ordinary system time changes do not trigger failover.
Keywords: #MGRFailover #MGRFailureDetection #TimeImpactOnMGR
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.