MySQL Cascade Replication Failure Caused by Duplicate server_id and Its Resolution
The article analyzes a MySQL cascade replication issue where the new slave failed to sync due to duplicate server_id values, explains the root cause stemming from automated server_id assignment, and provides step‑by‑step remediation to ensure unique server_id settings and reliable replication.
1 Business Scenario
In a production environment the original MySQL cluster ran on traditional HDDs with limited capacity and performance. To handle increased traffic a new high‑performance cluster with SSDs was built, preserving the existing master‑slave architecture and adding a cascade replication chain: old master → new master → new slave.
2 Problem Discovery
After the new cluster was deployed, replication appeared normal, but fifteen days later a data comparison revealed that the new master was correctly receiving updates from the old master while the new slave was not receiving any updates from the new master.
Problem Analysis
Checked that the new master’s binlog was enabled and that log_slave_update was also turned on; both were confirmed.
Created a new database and inserted rows on the old master to observe the binlog position changes.
Observed that after the inserts the binlog positions on the old master and the new master advanced, but the binlog position on the new slave remained unchanged, indicating a synchronization failure.
Compared the my.cnf files of the three servers and discovered a critical configuration conflict: the old master had server_id = 1, the new master had server_id = 2, and the new slave also had server_id = 1, duplicating the old master’s ID. This duplication meant that the cascade replication could not correctly identify the source of binlog events, even though no error was reported during the initial setup.
Reproduced the scenario in a local environment by deliberately setting duplicate server_id values; the test confirmed that any duplicate server_id in a cascade replication topology causes data‑sync failures.
Root Cause
The automated installation script assigned server_id values randomly (1 or 2). Consequently, the new slave received the same server_id as the old master, creating the conflict.
3 Remediation Steps
Since the binlog retained only 14 days of history, the immediate actions are:
Change the server_id on the new slave to a unique value that does not clash with any other node.
Backup the old master, restore it to the new cluster, and rebuild the cascade replication with the corrected server_id settings.
4 Lessons Learned
All MySQL instances participating in cascade replication must have distinct server_id values.
Ensure that binlog logging and related parameters (e.g., log_slave_update ) are enabled on every node.
Because cascade replication can be fragile, minimize its use in production environments whenever possible.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.