Understanding MySQL Group Replication Failure Detection: A Hands‑On Case Study
This article walks through a practical MySQL Group Replication case study, demonstrating how failure detection works, how network partitions affect node states, the role of XCom Cache, and the impact of configuration parameters on cluster availability and performance.
Case Study Setup
The test cluster uses a multi‑primary topology.
The test consists of two steps:
Simulate a network partition and observe its impact on each node.
Restore the network connection and observe node behavior.
1. Simulating Network Partition
On node3 run the following iptables commands to drop traffic to node1 (192.168.244.10) and node2 (192.168.244.20):
# iptables -A INPUT -p tcp -s 192.168.244.10 -j DROP
# iptables -A OUTPUT -p tcp -d 192.168.244.10 -j DROP
# iptables -A INPUT -p tcp -s 192.168.244.20 -j DROP
# iptables -A OUTPUT -p tcp -d 192.168.244.20 -j DROP
# date "+%Y-%m-%d %H:%M:%S"
2022-07-31 13:03:01After the fixed DETECTOR_LIVE_TIMEOUT of 5 seconds, the remaining nodes mark node3 as UNREACHABLE while node1 and node2 stay ONLINE:
mysql> SELECT member_id, member_host, member_port, member_state, member_role FROM performance_schema.replication_group_members;
+--------------------------------------+----------------+-------------+--------------+-------------+
| member_id | member_host | member_port | member_state | member_role |
+--------------------------------------+----------------+-------------+--------------+-------------+
| 207db264-0192-11ed-92c9-02001700754e | 192.168.244.10 | 3306 | ONLINE | PRIMARY |
| 2cee229d-0192-11ed-8eff-02001700f110 | 192.168.244.20 | 3306 | ONLINE | PRIMARY |
| 4cbfdc79-0192-11ed-8b01-02001701bd0a | 192.168.244.30 | 3306 | UNREACHABLE | PRIMARY |
+--------------------------------------+----------------+-------------+--------------+-------------+Because only one node remains ONLINE, the majority rule is violated and write operations are blocked:
mysql> DELETE FROM slowtech.t1 WHERE id=1;
-- blocked ...After roughly 16 seconds (the group_replication_member_expel_timeout), node1 and node2 expel node3, leaving a two‑node cluster.
2. Restoring the Network
Flush the iptables rules to restore connectivity:
# iptables -F
# date "+%Y-%m-%d %H:%M:%S"
2022-07-31 13:07:30Logs on node3 show it becomes reachable again, then immediately transitions to ERROR and super_read_only=ON:
2022-07-31T13:07:30.464179-00:00 0 [Warning] Plugin group_replication reported: 'Member with address 192.168.244.10:3306 is reachable again.'
2022-07-31T13:07:30.464226-00:00 0 [Warning] Plugin group_replication reported: 'Member with address 192.168.244.20:3306 is reachable again.'
2022-07-31T13:07:30.464239-00:00 0 [Warning] Plugin group_replication reported: 'The member has resumed contact with a majority of the members in the group. Regular operation is restored and transactions are unblocked.'
2022-07-31T13:07:37.458761-00:00 0 [Error] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.'
2022-07-31T13:07:37.459037-00:00 0 [Error] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.'
2022-07-31T13:07:40.653028-00:00 0 [System] Plugin group_replication reported: 'Setting super_read_only=ON.'Querying the membership table confirms the ERROR state:
mysql> SELECT member_id, member_host, member_port, member_state, member_role FROM performance_schema.replication_group_members;
+--------------------------------------+----------------+-------------+--------------+-------------+
| member_id | member_host | member_port | member_state | member_role |
+--------------------------------------+----------------+-------------+--------------+-------------+
| 4cbfdc79-0192-11ed-8b01-02001701bd0a | 192.168.244.30 | 3306 | ERROR | |
+--------------------------------------+----------------+-------------+--------------+-------------+If group_replication_autorejoin_tries is non‑zero (default 3 from MySQL 8.0.21), the node will automatically attempt to re‑join; otherwise it remains in ERROR or follows the action defined by group_replication_exit_state_action (READ_ONLY, OFFLINE_MODE, or ABORT_SERVER).
Failure Detection Process
Each node sends a heartbeat to every other node once per second. If no heartbeat is received within 5 seconds, the target node is marked SUSPECT and its state becomes UNREACHABLE. If half or more of the nodes are UNREACHABLE, the cluster stops accepting writes.
If the suspect node recovers before group_replication_member_expel_timeout (default 5 s, max 3600 s), cached XCom messages are applied. The cache size is controlled by group_replication_message_cache_size (default 1 GB).
If the timeout expires without recovery, the node is expelled.
Minority nodes stay in their current state until the network recovers or group_replication_unreachable_majority_timeout expires (default 0, counted from the 5 s detection point).
State transition triggers ONLINE → ERROR, rollback of blocked writes, and super_read_only=ON.
If group_replication_autorejoin_tries > 0, the node automatically retries to re‑join (default 3 attempts, 5 min interval).
If auto‑rejoin fails or is disabled, the action defined by group_replication_exit_state_action is executed (READ_ONLY, OFFLINE_MODE, or ABORT_SERVER).
XCom Cache
XCom Cache stores messages exchanged between group members as part of the consensus protocol. When the cache is too small, needed messages may be evicted, causing nodes to be expelled.
To test cache exhaustion, the article reduces group_replication_message_cache_size to 128 MB, sets group_replication_member_expel_timeout to 3600 s, and runs large transactions while the partition persists:
mysql> SET GLOBAL group_replication_message_cache_size=134217728;
mysql> STOP GROUP_REPLICATION;
mysql> START GROUP_REPLICATION;
mysql> SET GLOBAL group_replication_member_expel_timeout=3600;
# (iptables commands as above)
mysql> INSERT INTO slowtech.t1(c1) SELECT c1 FROM slowtech.t1 LIMIT 1000000;Relevant warning in the error log:
[Warning] [MY-011735] Plugin group_replication reported: '[GCS] Messages that are needed to recover node 192.168.244.30:33061 have been evicted from the message cache. Consider resizing the maximum size of the cache.'Memory usage can be inspected via performance_schema.memory_summary_global_by_event_name:
SELECT * FROM performance_schema.memory_summary_global_by_event_name WHERE event_name LIKE "%GCS_XCom::xcom_cache%"\G
EVENT_NAME: memory/group_rpl/GCS_XCom::xcom_cache
COUNT_ALLOC: 23678
COUNT_FREE: 22754
CURRENT_COUNT_USED: 924
CURRENT_NUMBER_OF_BYTES_USED: 126271905
HIGH_NUMBER_OF_BYTES_USED: 146137294Changes in COUNT_FREE indicate eviction of needed messages.
Practical Considerations
Any UNREACHABLE node prevents topology changes (adding or removing nodes).
In single‑primary mode a failed primary cannot be automatically replaced.
With consistency level AFTER or BEFORE_AND_AFTER, writes wait indefinitely for the unreachable node to become online.
Cluster throughput drops; in single‑primary mode group_replication_paxos_single_leader (available from MySQL 8.0.27) can mitigate the impact.
Therefore group_replication_member_expel_timeout should not be set excessively high in production.
References
Extending replication instrumentation: account for memory used in XCom – https://dev.mysql.com/blog-archive/extending-replication-instrumentation-account-for-memory-used-in-xcom/
MySQL Group Replication – Default response to network partitions has changed – https://dev.mysql.com/blog-archive/mysql-group-replication-default-response-to-network-partitions-has-changed/
No Ping Will Tear Us Apart – Enabling member auto‑rejoin in Group Replication – https://dev.mysql.com/blog-archive/no-ping-will-tear-us-apart-enabling-member-auto-rejoin-in-group-replication/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
