Databases 40 min read

Investigation of MySQL Group Replication Failover Triggered by System Time Anomaly and VM Suspension

The article analyzes a MySQL MGR cluster failover incident caused by a VM pause and system time jump, explains the underlying detection mechanisms, presents detailed source code excerpts, and demonstrates through debugging that time changes alone do not trigger MGR failover.

Aikesheng Open Source Community

Oct 10, 2022

Investigation of MySQL Group Replication Failover Triggered by System Time Anomaly and VM Suspension

1. Problem Description

On September 15, an alert indicated that the MySQL MGR cluster in the test environment was abnormal.

Log excerpt:

// 10.x.y.97 node MySQL error log shows node 10.x.y.95 unreachable:
2022-09-15T19:26:14.320181+08:00 0 "Warning" "MY-011493" "Repl" Plugin group_replication reported: "Member with address 10.x.y.95:3306 has become unreachable."
// MGR probe logs show a switch:
"2022-09-15 19:26:16" : Exception: Invalid primary_member_ip: or secondary_node_list:"10.x.y.96", "10.x.y.97" .
"2022-09-15 19:26:16" : Exception: MGR is likely to be switching, Sleep 1 sec and continue .
"2022-09-15 19:27:01" : Exception: MGR running with WARN. ONLINE nodes Pri:10.x.y.96 Sec:10.x.y.97 diff from conf_node:10.x.y.95,10.x.y.96,10.x.y.97 .

The alert indicates that the MySQL MGR cluster switched from a three‑node configuration to a two‑node configuration (Pri:10.x.y.96, Sec:10.x.y.97).

2. Preliminary Analysis

After receiving the alert, we examined MySQL error logs, OS logs, and monitoring data, discovering that the operating system time had changed unexpectedly ("Time has been changed" anomaly).

According to MySQL documentation, failure detection is time‑based:

If a member does not receive a message from another member within 5 seconds, it marks the member as UNREACHABLE in the replication_group_members table.

If the suspicion persists for more than 10 seconds, the member propagates the suspicion to the rest of the group. ( Reference )

We initially suspected that a sudden OS time jump triggered the MGR failover and reported the anomaly to the system administration team for further verification.

3. Root Cause

Deep analysis by system experts confirmed that the underlying cause was a VM pause around 19:27:13, during which the virtual machine could not perform any operation, including monitoring scripts, time updates, or MGR heartbeats.

When the VM resumed, the remaining two nodes could not communicate with the paused node, causing the paused node to be expelled and generating the alert.

4. Impact of Time on MGR

We asked whether a pure time jump (without a VM pause) would also trigger a failover.

Testing in the lab showed that when a node's clock was suddenly shifted forward or backward by one hour, the MGR cluster remained synchronized and no errors appeared in the error log; the time anomaly alone does not cause a failover.

5. Source Code Insight: Node Detection Functions (alive_task / detector_task)

The relevant functions and their call hierarchy are:

alive_task
  task_now
  may_be_dead
    task_now

detector_task
  check_global_node_set
    DETECT
      task_now
  check_local_node_set
    DETECT
      task_now

// alive_task implementation (simplified)
int alive_task(task_arg arg) {
  while (!xcom_shutdown) {
    // broadcast alive if >0.5s elapsed
    if (server_active(site, get_nodeno(site)) < task_now() - 0.5) {
      replace_pax_msg(&ep->i_p, pax_msg_new(alive_synode, site));
      ep->i_p->op = i_am_alive_op;
      send_to_all_site(site, ep->i_p, "alive_task");
    }
    // if no heartbeat >4s, ask if node is alive
    double sec = task_now();
    if (i != get_nodeno(site) && may_be_dead(site->detected, i, sec)) {
      replace_pax_msg(&ep->you_p, pax_msg_new(alive_synode, site));
      ep->you_p->op = are_you_alive_op;
      ep->you_p->a = new_app_data();
      ep->you_p->a->app_key.group_id = ep->you_p->a->group_id = get_group_id(site);
      ep->you_p->a->body.c_t = xcom_boot_type;
      init_node_list(1, &site->nodes.node_list_val[i], &ep->you_p->a->body.app_u_u.nodes);
      send_server_msg(site, i, ep->you_p);
    }
    TASK_DELAY(1.0);
  }
}

#define DETECTOR_LIVE_TIMEOUT 5.0
#define DETECT(site, i) \
  (i == get_nodeno(site)) || \
  (site->detected[i] + DETECTOR_LIVE_TIMEOUT > task_now())

static void check_global_node_set(site_def *site, int *notify) {
  u_int i;
  u_int nodes = get_maxnodes(site);
  site->global_node_count = 0;
  for (i = 0; i < nodes && i < site->global_node_set.node_set_len; i++) {
    int detect = DETECT(site, i);
    if (site->global_node_set.node_set_val[i]) site->global_node_count++;
    if (site->global_node_set.node_set_val[i] != detect) {
      *notify = 1;
    }
    DBGOHK(FN; NDBG(i, u); NDBG(*notify, d));
  }
}

static void check_local_node_set(site_def *site, int *notify) {
  u_int i;
  u_int nodes = get_maxnodes(site);
  for (i = 0; i < nodes && i < site->global_node_set.node_set_len; i++) {
    int detect = DETECT(site, i);
    if (site->local_node_set.node_set_val[i] != detect) {
      site->local_node_set.node_set_val[i] = detect;
      *notify = 1;
    }
    DBGOHK(FN; NDBG(i, u); NDBG(*notify, d));
  }
}

int detector_task(task_arg arg) {
  while (!xcom_shutdown) {
    site_def *x_site = get_executor_site_rw();
    if (x_site && get_nodeno(x_site) != VOID_NODE_NO) {
      if (x_site != last_x_site) {
        reset_disjunct_servers(last_x_site, x_site);
      }
      update_detected(x_site);
      if (x_site != last_x_site) {
        last_x_site = x_site;
        ep->notify = 1;
        ep->local_notify = 1;
      }
      check_global_node_set(x_site, &ep->notify);
      update_global_count(x_site);
      if (ep->notify && iamtheleader(x_site) && enough_live_nodes(x_site)) {
        ep->notify = 0;
        send_my_view(x_site); // send view change to expel dead node
      }
    }
    if (x_site && get_nodeno(x_site) != VOID_NODE_NO) {
      update_global_count(x_site);
      check_local_node_set(x_site, &ep->local_notify);
      if (ep->local_notify) {
        ep->local_notify = 0;
        deliver_view_msg(x_site); // expel dead node
      }
    }
    TIMED_TASK_WAIT(&detector_wait, 1.0);
  }
}

Key Points

Both alive_task and detector_task obtain the current time via task_now(), which uses a monotonic clock plus a fixed offset calculated at initialization. This means MGR heartbeats are based on a time source that is immune to OS time changes.

6. Debug Verification Records

Various debugger outputs show the task_timer structure and its fields ( real_start, monotonic_start, offset, now), confirming that the offset remains constant while the monotonic counter advances regardless of system clock adjustments.

// Example of task_timer after time change:
1: task_timer = {real_start = 1664049812.9187768, monotonic_start = 50635.304078640998, offset = 1663999177.6146982, now = 1664055893.4010971, done = 1}

After manually adjusting the system clock, the now value updates based on the unchanged offset, demonstrating that MGR continues to operate correctly.

Conclusion: The failover was caused by a VM pause, not by a time jump. MySQL MGR uses a monotonic time source, so ordinary system time changes do not trigger failover.

Keywords: #MGRFailover #MGRFailureDetection #TimeImpactOnMGR

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Database operations failover time synchronization Group Replication MGR

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.