Fundamentals 21 min read

How Linux ext4 Manages Files, Inodes, and Caching Internally

This article explains the design of Linux file systems, focusing on ext4's inode layout, block allocation, extents, directory storage, journaling modes, and the kernel's cached and direct I/O paths, complete with code snippets and structural diagrams.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How Linux ext4 Manages Files, Inodes, and Caching Internally

Linux File System Overview

Linux file systems must have a strict organization, block‑based storage, an index area for locating file blocks, a cache layer for hot files, hierarchical directories, and kernel data structures that track which processes have opened which files.

Files are stored as blocks.

An index region speeds up block location.

Hot files benefit from a caching layer.

Directories are organized as folders for easy management.

The kernel maintains in‑memory structures linking files to processes.

File system function diagram
File system function diagram

ext Series Format – Inode and Block Storage

Disks are divided into equal‑sized blocks (default 4 KB). Each file has an inode that stores metadata and an array i_block of block pointers.

struct ext4_inode {
  __le16  i_mode;          /* File mode */
  __le16  i_uid;           /* Low 16 bits of Owner UID */
  __le32  i_size_lo;       /* Size in bytes */
  __le32  i_atime;         /* Access time */
  __le32  i_ctime;         /* Inode change time */
  __le32  i_mtime;         /* Modification time */
  __le32  i_dtime;         /* Deletion time */
  __le16  i_gid;           /* Low 16 bits of Group ID */
  __le16  i_links_count;   /* Links count */
  __le32  i_blocks_lo;     /* Blocks count */
  __le32  i_flags;         /* File flags */
  __le32  i_block[EXT4_N_BLOCKS]; /* Pointers to blocks */
  /* ... other fields ... */
};

The inode records permissions ( i_mode), owner UID/GID, size, timestamps, and the block pointers that hold the file’s data.

Direct block pointers ( i_block[0‑11]) store up to 12 block addresses. When a file exceeds this, i_block[12] points to an indirect block, i_block[13] to a doubly‑indirect block, and i_block[14] to a triply‑indirect block, forming a multi‑level lookup tree.

Extents – Reducing Fragmentation

To avoid many disk seeks for large files, ext4 introduces extents , which map a contiguous range of logical blocks to a contiguous range of physical blocks.

Extent tree diagram
Extent tree diagram
struct ext4_extent_header {
  __le16  eh_magic;   /* Magic number */
  __le16  eh_entries; /* Number of valid entries */
  __le16  eh_max;     /* Capacity of entries */
  __le16  eh_depth;   /* Tree depth */
  __le32  eh_generation;
};

struct ext4_extent {
  __le32  ee_block;   /* First logical block covered */
  __le16  ee_len;     /* Number of blocks covered */
  __le16  ee_start_hi;/* High 16 bits of physical block */
  __le32  ee_start_lo;/* Low 32 bits of physical block */
};

struct ext4_extent_idx {
  __le32  ei_block;   /* Index covers logical blocks from this */
  __le32  ei_leaf_lo; /* Physical block of next level */
  __le16  ei_leaf_hi; /* High 16 bits of physical block */
  __u16   ei_unused;
};

If the inode can hold an ext4_extent_header with up to four extents, the tree depth ( eh_depth) is zero (leaf node). Larger files cause the tree to split, increasing eh_depth.

Inode and Block Bitmaps

Both inode and block bitmaps are 4 KB, with each bit representing the allocation state of an inode or block (1 = used, 0 = free). When creating a file (via open(..., O_CREAT)), the kernel scans the inode bitmap for a free entry and similarly allocates blocks using the block bitmap.

File System Layout

The superblock ( ext4_super_block) stores global counts such as total inodes, total blocks, inodes per group, and blocks per group. Each block group has a descriptor ( ext4_group_desc) containing pointers to its inode bitmap, block bitmap, and inode table.

To avoid a single point of failure, the superblock and group descriptor tables are replicated in each block group. Ext4 further reduces metadata overhead with Meta Block Groups , where groups are clustered (64 groups per meta‑group) and each meta‑group stores only its own descriptors.

Meta block group layout
Meta block group layout

Directory Storage Format

Directories are regular files whose data blocks contain ext4_dir_entry records. The first two entries are “.” (current directory) and “..” (parent directory). When the EXT4_INDEX_FL flag is set, the directory uses a hash‑based index tree to speed up lookups.

Directory index tree
Directory index tree

Linux File Caching Layer (ext4)

Ext4 defines ext4_file_operations which point to ext4_file_read_iter and ext4_file_write_iter. These wrappers call the generic kernel helpers generic_file_read_iter and __generic_file_write_iter.

const struct file_operations ext4_file_operations = {
  .read_iter  = ext4_file_read_iter,
  .write_iter = ext4_file_write_iter,
  /* ... */
};

Two I/O paths exist:

Cached I/O : Data is first read into or written from the page cache. Writes are considered complete once data reaches the cache; the kernel later flushes dirty pages to disk.

Direct I/O : Applications bypass the page cache and read/write directly to the underlying storage, reducing copy overhead.

Cached Write Path

The function generic_perform_write performs the following steps for each page:

Call address_space->write_begin to prepare the page.

Copy data from user space to the page with iov_iter_copy_from_user_atomic.

Call address_space->write_end to finish the write.

Invoke balance_dirty_pages_ratelimited to decide whether to start write‑back of dirty pages.

ssize_t generic_perform_write(struct file *file,
                               struct iov_iter *i, loff_t pos)
{
  struct address_space *mapping = file->f_mapping;
  const struct address_space_operations *a_ops = mapping->a_ops;
  do {
    struct page *page;
    unsigned long offset, bytes;
    status = a_ops->write_begin(file, mapping, pos, bytes, flags,
                               &page, &fsdata);
    copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
    flush_dcache_page(page);
    status = a_ops->write_end(file, mapping, pos, bytes, copied,
                               page, fsdata);
    pos += copied;
    written += copied;
    balance_dirty_pages_ratelimited(mapping);
  } while (iov_iter_count(i));
}

During write_begin, ext4 may start a journal transaction (Journal mode) or simply log metadata (ordered mode, default) before data is flushed.

Cached Read Path

Reading uses generic_file_buffered_read, which first looks for the page in the cache. If missing, it triggers synchronous readahead, then asynchronous readahead, and finally copies the page to user space with copy_page_to_iter.

static ssize_t generic_file_buffered_read(struct kiocb *iocb,
                                          struct iov_iter *iter,
                                          ssize_t written)
{
  struct file *filp = iocb->ki_filp;
  struct address_space *mapping = filp->f_mapping;
  for (;;) {
    struct page *page = find_get_page(mapping, index);
    if (!page) {
      if (iocb->ki_flags & IOCB_NOWAIT) goto would_block;
      page_cache_sync_readahead(mapping, ra, filp, index, last_index-index);
      page = find_get_page(mapping, index);
      if (!page) goto no_cached_page;
    }
    if (PageReadahead(page))
      page_cache_async_readahead(mapping, ra, filp, page, index, last_index-index);
    ret = copy_page_to_iter(page, offset, nr, iter);
  }
}

When the number of dirty pages exceeds a threshold, balance_dirty_pages_ratelimited schedules background write‑back, and explicit sync calls or memory pressure also trigger flushing.

void balance_dirty_pages_ratelimited(struct address_space *mapping)
{
  struct inode *inode = mapping->host;
  struct backing_dev_info *bdi = inode_to_bdi(inode);
  struct bdi_writeback *wb = NULL;
  int ratelimit;
  /* ... */
  if (unlikely(current->nr_dirtied >= ratelimit))
    balance_dirty_pages(mapping, wb, current->nr_dirtied);
}

In summary, ext4 combines a rich on‑disk layout (inodes, extents, meta block groups) with kernel mechanisms for cached and direct I/O, journaling, and intelligent write‑back to provide reliable and performant file storage on Linux.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Linuxfile systeminodeext4
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.