How I Rescued a Production Server After a Fatal rm -rf Mistake
This article recounts a disastrous production server data loss caused by an accidental rm -rf command, details the step‑by‑step recovery using ext3grep, extundelete and MySQL binlog, and shares hard‑earned lessons to prevent similar incidents.
Incident Background
A junior team member was tasked with reinstalling Oracle on a production server using the root account. While following an online guide, she executed a command to delete Oracle's installation directory: rm -rf $ORACLE_BASE/* Because the $ORACLE_BASE variable was unset, the command became: rm -rf /* The command wiped the entire filesystem, including Tomcat, MySQL databases and other critical files.
After discovering the damage, the team mounted the disk on another server and found that the offline backup file was only 1 KB, containing just a few mysqldump comments, while the most recent reliable backup was half a year old.
Rescue Attempt with ext3grep
Searching for a recovery tool, the team found ext3grep , which can restore files deleted with rm -rf on ext3 filesystems.
First, they unmounted the volume to avoid overwriting deleted data and installed ext3grep. Then they listed all deleted file names: ext3grep /dev/vgdata/LogVol00 --dump-names The tool printed many paths, giving hope that the data could be recovered without invoking the B‑plan.
Since ext3grep cannot restore by directory, they attempted a full restore: ext3grep /dev/vgdata/LogVol00 --restore-all Disk space was insufficient, so only a few files were restored successfully. They redirected the file list to a text file and filtered MySQL table files:
ext3grep /dev/vgdata/LogVol00 --dump-names >/usr/allnames.txt
# extract MySQL files into mysqltbname.txt (omitted for brevity)Next, they wrote a shell script to restore each MySQL file:
while read LINE; do
echo "begin to restore file $LINE"
ext3grep /dev/vgdata/LogVol00 --restore-file $LINE
if [ $? != 0 ]; then
echo "restore failed, exit"
exit 1
fi
done < ./mysqltbname.txtThe script recovered about 40 files after 20 minutes, far fewer than the ~300 files needed for all tables.
Trying extundelete
The team also tried extundelete , which claims directory‑level restoration, but the recovered files were corrupted and unusable.
Idea: Recover via MySQL Binlog
Remembering that the MySQL service had binary logging enabled, they located three binlog files:
mysql-binlog0001
mysql-bin.000009
mysql-bin.000010
Restoring the first binlog failed, but the third (≈ hundreds of MB) succeeded:
mysqlbinlog /usr/mysql-bin.000010 | mysql -uroot -pAfter entering the password, the process completed, and the application came back online.
Post‑mortem and Lessons Learned
Never assign critical production tasks to inexperienced staff without proper briefing and supervision.
Automated backups must be verified; a 1 KB dump indicates a broken backup pipeline.
Implement real‑time monitoring and alerting for service anomalies.
Never perform destructive operations as root; use least‑privilege accounts.
Maintain offline, verified backups and test restoration procedures regularly.
Through collective effort—developers, testers, product managers and leaders—the team managed to recover most data and restore the service, concluding the incident with a thorough reflection and a commitment to avoid repeat mistakes.
Tools Referenced
https://code.google.com/p/ext3grep
http://extundelete.sourceforge.net
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
