How Journaling File Systems Prevent Data Corruption After Crashes
This article explains why file writes are non‑atomic, illustrates the risks of partial writes, and details how journaling (write‑ahead logging) and its variants—data and metadata journaling—ensure filesystem consistency and performance on Linux systems.
Non‑atomic nature of file writes
When a file is written the operation touches both user data and several metadata structures (superblock, inode bitmap, inode, data‑block bitmap). Because the write is performed in multiple steps, a power loss or crash can interrupt the sequence and leave the filesystem inconsistent.
Typical write sequence for a single block
Allocate a free block from the data‑block bitmap.
Insert a pointer to that block into the file’s inode.
Write the user data into the allocated block.
Inconsistencies caused by interruption
If step 2 succeeds but step 3 fails, the inode points to a block that contains stale or garbage data.
If step 2 succeeds but step 1 fails, the inode believes the block is owned while the bitmap still marks it as free, which can lead to double allocation and data overwrite.
If step 1 succeeds but step 2 fails, a block is marked allocated but never referenced, wasting space.
If step 3 succeeds but step 2 fails, data is written to a block that the file does not know about, effectively a lost write.
Journaling as a solution
Journaling file systems introduce a write‑ahead log (the journal) that records each transaction before the actual modifications are applied. The process is:
Journal write : serialize the transaction (metadata updates and optionally user data) into the journal.
Journal commit : append a special end‑marker after the whole entry has been flushed to stable storage.
Checkpoint : once the commit marker is present, perform the real writes to the filesystem (metadata and, in data‑journaling mode, user data).
Free : after the checkpoint completes, the journal space occupied by that entry can be reclaimed; the journal is typically a circular buffer.
During a crash, the mount routine scans the journal. Entries that lack a valid end‑marker are discarded; entries with a marker are replayed, restoring the filesystem to a consistent state.
Data journaling vs. metadata (ordered) journaling
Data journaling records both metadata and user data in the journal. Linux’s EXT3 supports this mode. Because each write is performed twice (once to the journal, once to the final location), the I/O cost roughly doubles, which can be a severe performance penalty for large files.
Every write involving user data incurs two full‑disk writes: one to the journal and one to the file system.
Metadata (ordered) journaling records only metadata. The user data is written directly to its final location first; only after the data is on disk is the metadata journaled. If a crash occurs, at worst the last incomplete journal entry is lost and the corresponding user data may be discarded, but the filesystem’s structural consistency is preserved.
Most modern Linux filesystems (e.g., EXT3, EXT4) allow the administrator to select either mode via mount options data=journal (data journaling) or data=ordered (metadata journaling, the default).
Key implementation details
Journal entries are written in fixed‑size blocks (typically 4 KB). Because a single entry may span multiple blocks, an explicit end‑marker (often a checksum or magic number) is required to detect incomplete writes.
The journal is allocated as a circular log; when the tail catches up to the head, space is reclaimed after the associated checkpoint has finished.
Replay on mount replays only committed entries, guaranteeing that the on‑disk metadata reflects a state that could have been produced by a crash‑free execution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
