How Journal File Systems Prevent Data Corruption After Power Failures
This article explains why file systems need journaling to avoid data loss during power outages, describes the non‑atomic nature of write operations, and compares data journaling with metadata (ordered) journaling, using Linux EXT3 as an example.
File systems must prevent data corruption caused by power loss or crashes, and the root issue is that file writes are not atomic because they involve both user data and metadata such as the superblock, inode bitmap, inode, and data block bitmap.
A simplified write operation includes three steps: (1) allocate a data block from the data block bitmap, (2) add a pointer to that block in the inode, and (3) write the user data into the block.
If step 2 completes but step 3 does not, the file thinks it owns the block while the block contains garbage, causing data corruption.
If step 2 completes but step 1 does not, the metadata becomes inconsistent because the file believes the block is allocated while the file system still marks it free, potentially leading to overwrites.
If step 1 completes but step 2 does not, a data block is allocated but unused, wasting space.
If step 3 completes but step 2 does not, user data is written to a block the file does not recognize, resulting in a lost write.
Journal file systems were created to solve these problems.
Their principle is to record all upcoming steps of a write operation as a transaction in a dedicated on‑disk area called a journal (write‑ahead logging). Only after the journal entry is safely stored does the system perform the actual write (checkpoint), ensuring that after a power loss the journal can be replayed to restore consistency.
To handle power loss during journal writes, each journal entry is written with an end‑marker; only when the end‑marker is successfully written is the entry considered valid. Incomplete entries lacking the marker are discarded, guaranteeing that only complete logs are replayed.
Journal space is limited and reused cyclically, so it is often called a circular log. After the associated write completes, the log entry can be freed.
The journal workflow consists of:
Journal write: record the transaction in the journal.
Journal commit: write the end‑marker to finalize the entry.
Checkpoint: perform the real write of metadata and user data to the file system.
Free: reclaim the journal space.
When both metadata and user data are recorded in the journal (Data Journaling), as supported by Linux EXT3, each write incurs double I/O, which can significantly reduce performance, especially for large files.
Each write operation therefore writes metadata and user data twice—once to the journal and once to the file system—doubling the I/O cost and lowering efficiency.
A more efficient approach is Metadata Journaling (also called Ordered Journaling), which writes user data first and then logs only the metadata. This ensures that as long as the journal is valid, the corresponding user data is also valid, and after a crash the worst case is the loss of the last incomplete journal entry.
Most file systems, including Linux EXT3, can be configured to use either Data Journaling or Ordered (Metadata) Journaling.
Reference: Crash Consistency: FSCK and Journaling
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
