Databases 8 min read

MySQL High‑Availability Incident Review and Resolution in a Dual‑Master Setup with Keepalived

This article recounts a MySQL high‑availability incident in a dual‑master environment, explains how missing binary‑log index files caused replication failures, and details step‑by‑step troubleshooting, directory recreation, binlog position correction, and configuration improvements to restore reliable database operation.

IT Services Circle
IT Services Circle
IT Services Circle
MySQL High‑Availability Incident Review and Resolution in a Dual‑Master Setup with Keepalived

The author recounts a recent MySQL high‑availability incident in a dual‑master setup using Keepalived.

In the test environment at 10:30 am, the testing group reported that MySQL containers were not running, prompting an investigation.

Two MySQL nodes (node55 and node56) act as master‑master, each with a Keepalived instance monitoring the MySQL container; diagrams (omitted) illustrate the layout.

Docker ps showed no MySQL containers, Keepalived was running, but logs revealed that the MySQL process repeatedly exited because the binary‑log index file mysql-bin.index was missing.

Solution part 1 – recreate log directory and permissions:

mkdir log
chmod 777 log -R

Solution part 2 – fix replication position: On the master (node55) obtain the current binlog file and position with

FLUSH TABLES WITH READ LOCK; SHOW MASTER STATUS; UNLOCK TABLES

, then on the slave (node56) execute:

# Stop slave
STOP SLAVE;

# Set master info
CHANGE MASTER TO MASTER_HOST='10.2.1.55',
MASTER_PORT=3306,
MASTER_USER='vagrant',
MASTER_PASSWORD='vagrant',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=117748;

# Start slave
START SLAVE;

After applying the above, the I/O thread resumed and replication became healthy.

The log directory had been removed during a previous migration, causing the missing mysql-bin.index file and subsequent replication errors.

Improvements include separating log files from the data directory (e.g., datadir=/var/lib/mysql/data and log_bin=/var/lib/mysql/log) and adding proactive alerting (email from Keepalived or log‑monitoring) to detect MySQL failures earlier.

By recreating the log directory, correcting the binlog position, and adjusting configuration, the dual‑master MySQL cluster returned to normal operation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high availabilitymysqlReplicationtroubleshootingdatabaseskeepalived
IT Services Circle
Written by

IT Services Circle

Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.