Why Do Files Disappear After a Cloud VM Restart? Common Pitfalls and Fixes
This article outlines five typical cloud‑host file‑loss scenarios—including tmpfs data loss after reboot, accidental rm commands, and disk damage from fio or dd—explains their causes, and provides practical prevention and recovery recommendations for operations engineers.
In daily cloud‑host operations, engineers often face user‑reported file‑loss issues that can cause significant inconvenience. The following typical scenarios are summarized to help avoid these risks and prevent repeated mistakes.
Scenario 1: File loss after cloud VM restart
Phenomenon: Users report that data stored on the VM disappears after a reboot.
Analysis: The missing files are located in the
/devdirectory, which is a
tmpfstemporary file system stored in RAM and not persisted to disk, so a reboot clears the data.
Explanation:
What is
tmpfs? It is an in‑memory temporary file system offering high performance.
Common
tmpfsmounts include
/dev,
/dev/shm,
/sys/fs/cgroup,
/run/user/0, etc., typically sized to half of total RAM. Use
dfto identify
tmpfsvolumes.
Suggestion:
Use
tmpfsonly for program or application caches.
Avoid storing persistent data on
tmpfsunless data loss is acceptable.
Scenario 2: Accidental rm command execution
Phenomenon: After a reboot, the VM cannot ping, critical services fail to start, and essential system files are missing.
Analysis:
Console shows the OS stuck at boot with many "command not found" errors.
Inspecting
/root/.bash_historyreveals destructive commands such as
rm -rf /,
rm -rf ./*, etc., which delete large portions of the system.
Running
rpm -Va | grep misscan verify missing files, though files not installed via RPM cannot be checked.
Suggestion:
Use
rmcautiously; add the
-iflag for interactive confirmation, especially with
-rand
-f.
If accidental deletion occurs, stop any further writes to the disk immediately to preserve recoverable data.
Restore missing system files by copying them from a healthy machine or reinstalling packages with
yum reinstall <package>or
yum update <package>.
Scenario 3: File‑system damage caused by fio
Phenomenon: Users report MySQL database anomalies on the cloud host.
Analysis:
High IOWAIT observed, suggesting I/O throttling issues.
Even after lifting disk limits, MySQL continues crashing; logs show corrupted data files.
Investigation reveals that the
fiotool was used directly on the block device
/dev/vdb, bypassing the file system and overwriting real data, leading to severe corruption.
The affected
/datavolume on both primary and standby MySQL nodes became unreliable and required reformatting and data restoration from backups.
Suggestion:
Use
fioonly in production after thorough testing.
When running
fio, avoid specifying a raw block device; instead, create a regular file (e.g., with
touch) and target that file with the
--filenameoption.
Scenario 4: File‑system damage caused by dd
Phenomenon: Users report abnormal file access on the cloud host.
Analysis:
Files on the data volume become inaccessible, and
lsreports "Structure needs cleaning" errors.
System logs (
dmesg,
/var/log/messages) show numerous XFS errors.
Root's command history reveals the use of
ddto write zeros directly to
/dev/vdb, damaging the underlying block device.
Suggestion:
Use
ddcautiously in production; test beforehand.
Never specify a raw block device for the
ofparameter; instead, write to a regular file.
Scenario 5: Accidental deletion of a data disk
Phenomenon: Users report that a cloud disk was mistakenly deleted.
Analysis:
Check the deletion timestamp in the cloud portal; resources are often retained for a short period and can be recovered.
Verify the disk state in the virtualization console; if the state is "Ready", it may still be recoverable.
If not yet deleted, re‑attach the disk to the host and synchronize disk information via the portal.
Suggestion:
Handle disk deletion operations with extreme caution; once deleted, data may be unrecoverable.
Before deletion, confirm the disk is not in use (e.g., using
lsof -n,
df -h,
lsblk, etc.) and unmount it, removing any entries from
/etc/fstab.
If accidental deletion occurs, contact recovery support immediately to prevent physical cleanup.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.