Why a Missing mysql-bin.index Crashed MySQL HA and How to Recover It
A detailed post‑mortem of a MySQL dual‑master cluster failure caused by a missing mysql-bin.index file, showing step‑by‑step diagnostics, container logs, replication position fixes, and preventive measures for high‑availability deployments.
Incident Overview
The author describes a production incident in a test environment where a MySQL dual‑master cluster (two MySQL nodes with Keepalived for HA) stopped serving traffic. Both MySQL containers disappeared from docker ps, prompting a systematic investigation.
Root Cause Investigation
Checked Keepalived status; it was running and repeatedly restarting MySQL.
Used docker ps -a to see that MySQL containers exited after a restart.
Examined container logs with docker logs <container_id>, which reported that mysql-bin.index was missing.
The missing index file prevented the slave from locating the correct binary log, causing replication errors.
Fixing the Missing mysql-bin.index
Created the required log directory and set permissions:
mkdir log
chmod 777 log -RAfter the directory existed, Keepalived could restart MySQL successfully.
Adjusting Replication Position
On the master (node55) the current binlog file and position were obtained:
FLUSH TABLES WITH READ LOCK;
SHOW MASTER STATUS;
UNLOCK TABLESOn the slave (node56) the replication was re‑configured to the correct file and offset:
# Stop slave
STOP SLAVE;
# Set correct master info
CHANGE MASTER TO MASTER_HOST='10.2.1.55',
MASTER_PORT=3306,
MASTER_USER='vagrant',
MASTER_PASSWORD='vagrant',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=117748;
# Start slave
START SLAVE;After applying these commands the I/O thread resumed, and replication became healthy.
Lessons and Improvements
The root cause was an accidental deletion of the /var/lib/mysql/log directory (the "log" database) during a previous migration, which also removed the mysql-bin.index file. To avoid similar issues, the author suggests:
Separate the binary log directory from the data directory (e.g., log_bin = /var/lib/mysql/log).
Use single‑database sync instead of full‑cluster overwrite to prevent accidental data loss.
Implement proactive alerting for MySQL failures, such as Keepalived email notifications or log‑based monitoring.
After fixing the directory and replication settings, both nodes operated normally, and the issue was confirmed by connecting with Navicat.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
