How I Recovered a Production Server After Accidentally Deleting All Files with rm -rf
This article recounts a production‑server disaster caused by an unchecked rm -rf command, details the step‑by‑step data‑recovery attempts using ext3grep, extundelete and MySQL binlogs, and shares hard‑earned lessons to prevent similar incidents.
Accident Background
A junior colleague was tasked with installing Oracle on a production server. While trying to uninstall, she ran the command
rm -rf $ORACLE_BASE. Because the
ORACLE_BASEvariable was unset, the command became
rm -rf /*, wiping the entire disk, including Tomcat, MySQL and other applications.
The server hosted a customer’s production system that had been running for many months, and the only available backup was a 1 KB file containing a few
mysqldumpcomments; the most recent full backup dated back to December 2013.
Rescue Attempt with ext3grep
After unmounting the affected volume to prevent further writes, the team downloaded and compiled ext3grep (the disk was formatted with ext3). The first step was to list all deleted files:
<code>ext3grep /dev/vgdata/LogVol00 --dump -names</code>Because the tool cannot restore by directory, the command to restore everything was:
<code>ext3grep /dev/vgdata/LogVol00 --restore-all</code>Insufficient free space halted the full restore, so specific files were targeted, e.g.:
<code>ext3grep /dev/vgdata/LogVol00 --restore-file var/lib/mysql/aqsh/tb_b_attench.MYD</code>All deleted filenames were dumped to a text file and a shell script was written to restore each MySQL file automatically:
<code>while read LINE; do
echo "begin to restore file $LINE"
ext3grep /dev/vgdata/LogVol00 --restore-file $LINE
if [ $? != 0 ]; then
echo "restore failed, exit"
# exit 1
fi
done < ./mysqltbname.txt</code>The script ran for about 20 minutes and recovered roughly 40 files, far short of the ~300 files needed for the 100 MySQL tables.
Trying extundelete
Another tool, extundelete , was tested with the command:
<code>extundelete /dev/vgdata/LogVol00 --restore-directory var/lib/mysql/aqsh</code>Unfortunately, the targeted files were already corrupted and could not be recovered.
Idea: Recovering from MySQL Binlog
Remembering that MySQL binlog was enabled, the team located three binlog files (
mysql-bin.000001,
mysql-bin.000009,
mysql-bin.000010). Restoring the first file failed, but the tenth binlog (several hundred MB) succeeded:
<code>mysqlbinlog /usr/mysql-bin.000010 | mysql -uroot -p</code>After applying the binlog, the missing attendance and mobile‑report data reappeared in the application.
Post‑mortem and Lessons Learned
Never assign critical maintenance tasks without clearly communicating the risks and providing proper training.
Automated backups must be verified regularly; a 1 KB backup is useless.
Implement monitoring and alerting so that failures are detected early.
Never perform destructive operations as the root user; use accounts with least privilege.
The incident prompted a collective reflection, improved backup procedures, and reinforced the importance of clear processes and responsibility assignment.
Tools Mentioned
ext3grep – https://code.google.com/p/ext3grep/
extundelete – http://extundelete.sourceforge.net/
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.