Databases 11 min read

Automatic Member Rejoin in MySQL Group Replication (MGR): Features, Configuration, and Monitoring

Starting with MySQL 8.0.16, Group Replication introduces an automatic member rejoin feature that allows expelled or disconnected nodes to attempt reconnection without manual intervention, configurable via the group_replication_autorejoin_tries variable, with monitoring via Performance Schema and trade‑offs compared to expel timeout.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Automatic Member Rejoin in MySQL Group Replication (MGR): Features, Configuration, and Monitoring

Author: Ricardo Ferreira Translator: Guan Changlong Tags: Group Replication, High Availability

With the release of MySQL 8.0.16, new functionality was added to MySQL Group Replication (MGR) to improve high‑availability by automatically re‑joining members that have left the group, eliminating the need for manual intervention in certain scenarios.

Introduction

MGR enables MySQL users to manage highly available groups, providing features such as fault tolerance and failure detection. One core guarantee is that the group appears as an indivisible whole to the user; any member join or leave is immediately visible to all other members. New members must catch up on transactions through a process called "donation" and are declared ONLINE only after a successful distributed recovery.

The group communication layer (GCS) detects failed or suspect members and removes them after a configurable suspicion timeout.

Problem with Re‑joining Expelled Members

In a three‑node group, a member may experience packet loss, disconnection, or other unrecoverable errors. If the member is expelled, it cannot re‑join automatically and requires manual intervention. When the expel‑timeout variable is non‑zero, the member waits for the timeout before being expelled; after expulsion, it must reconnect manually.

MySQL 8.0.16 introduces an automatic re‑join feature: once a member is expelled, it will repeatedly attempt to re‑join the group until a configured number of tries is reached, optionally waiting at least five minutes between attempts.

How to Enable Automatic Re‑join

Set the system variable group_replication_autorejoin_tries to the desired number of retries. The default value is 0 (feature disabled).

SET GLOBAL group_replication_autorejoin_tries = 3

How to Verify Automatic Re‑join

The process is observable through the Performance Schema. When the auto‑rejoin procedure starts, an event named something like "stage/group rpl/Undergoing auto-rejoin procedure" is registered in performance_schema.events_stages_current and related tables.

Thread ID (THREAD_ID)

Event name (EVENT_NAME)

Start/End timestamps and total duration (TIMER_START, TIMER_END, TIMER_WAIT)

Work units completed and estimated (WORK_COMPLETED, WORK_ESTIMATED)

To check whether the procedure is running:

SELECT COUNT(*)
FROM performance_schema.events_stages_current
WHERE EVENT_NAME LIKE '%auto-rejoin%';

The query returns a count greater than zero if the auto‑rejoin process is active.

To see how many retries have occurred so far:

SELECT WORK_COMPLETED
FROM performance_schema.events_stages_current
WHERE EVENT_NAME LIKE '%auto-rejoin%';

To estimate the remaining time until the next retry (each retry is preceded by a 5‑minute sleep), you can compute:

SELECT (300.0 - ((TIMER_WAIT*1e-12) - 300.0 * num_retries)) AS time_remaining
FROM (
  SELECT COUNT(*) - 1 AS num_retries
  FROM performance_schema.events_stages_current
  WHERE EVENT_NAME LIKE '%auto-rejoin%'
) AS T,
performance_schema.events_stages_current
WHERE EVENT_NAME LIKE '%auto-rejoin%';

In the example, the remaining time is 30 seconds.

Trade‑offs Between Automatic Re‑join and Expel Timeout

Automatic re‑join: configurable retry count, fully monitorable via Performance Schema, allows adding/removing members and changing primary during retries, but may increase the chance of stale reads.

Expel timeout: keeps the suspect member in the group, useful for short network glitches, but prevents member changes during suspicion and is not monitorable.

Conclusion

Enabling automatic re‑join reduces the need for manual intervention when MySQL instances experience brief network failures, while still providing high‑availability guarantees. The feature is disabled by default and can be tuned via group_replication_autorejoin_tries to suit the reliability requirements of your deployment.

Summary

A new system variable group_replication_autorejoin_tries allows users to set how many times an MGR member will attempt to re‑join after being expelled or losing contact with the majority of the group. By default the feature is off; enabling it helps avoid manual intervention during transient network issues.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilitymysqlPerformance SchemaGroup ReplicationAuto Rejoin
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.