How Does Linux ext4 Manage Files? Inodes, Blocks, and Caching Explained
This article explains the core principles of Linux file systems, detailing how ext4 organizes data with blocks and inodes, uses indexing, extents, directory storage, bitmap management, meta block groups, and caching mechanisms, including buffered and direct I/O paths.
File systems must have a strict organization so that files can be stored in block units.
They need an index area to locate the blocks belonging to a file.
Hot files that are frequently read or written should be cached.
Files are organized in directories for easy management.
The Linux kernel maintains data structures in memory to track which files are opened by which processes.
Inode and Block Storage
Disks are divided into equal-sized units called blocks (default 4 KB). Files are stored in these blocks, allowing non‑contiguous allocation.
Each file and directory has an inode; a directory is also a file with its own inode.
struct ext4_inode {
__le16 i_mode;
__le16 i_uid;
__le32 i_size_lo;
__le32 i_atime;
__le32 i_ctime;
__le32 i_mtime;
__le32 i_dtime;
__le16 i_gid;
__le16 i_links_count;
__le32 i_blocks_lo;
__le32 i_flags;
...
__le32 i_block[EXT4_N_BLOCKS];
__le32 i_generation;
__le32 i_file_acl_lo;
__le32 i_size_high;
...
};The inode stores permissions (i_mode), owner UID (i_uid), group GID (i_gid), size (i_size_lo/high), block count (i_blocks_lo), timestamps (i_atime, i_ctime, i_mtime), and an array i_block that points to the file’s data blocks.
#define EXT4_NDIR_BLOCKS 12
#define EXT4_IND_BLOCK EXT4_NDIR_BLOCKS
#define EXT4_DIND_BLOCK (EXT4_IND_BLOCK + 1)
#define EXT4_TIND_BLOCK (EXT4_DIND_BLOCK + 1)
#define EXT4_N_BLOCKS (EXT4_TIND_BLOCK + 1)In ext2/ext3 the first 12 entries of i_block store direct block addresses. When a file exceeds 12 blocks, i_block[12] points to an indirect block, i_block[13] to a double‑indirect block, and i_block[14] to a triple‑indirect block, forming a tree of pointers.
To improve large‑file performance, ext4 introduces extents, which map a range of logical blocks to a contiguous range of physical blocks, reducing fragmentation and I/O overhead.
Extent Structure
struct ext4_extent_header {
__le16 eh_magic;
__le16 eh_entries;
__le16 eh_max;
__le16 eh_depth;
__le32 eh_generation;
};Leaf nodes contain ext4_extent entries that point directly to physical blocks; internal nodes contain ext4_extent_idx entries that point to lower‑level nodes. Each entry occupies 12 bytes.
struct ext4_extent {
__le32 ee_block; /* first logical block */
__le16 ee_len; /* number of blocks */
__le16 ee_start_hi; /* high 16 bits of physical block */
__le32 ee_start_lo; /* low 32 bits of physical block */
};
struct ext4_extent_idx {
__le32 ei_block; /* logical block covered */
__le32 ei_leaf_lo; /* low 32 bits of leaf block */
__le16 ei_leaf_hi; /* high 16 bits of leaf block */
__u16 ei_unused;
};When the file is small enough, the inode can hold the extent header and a few extents directly (depth 0). Larger files require a tree with depth > 0, where the root resides in the inode.
Inode and Block Bitmaps
Both inode and block bitmaps are 4 KB; each bit indicates whether the corresponding inode or block is in use.
Creating a new file (open with O_CREAT) involves scanning the inode bitmap for a free inode and the block bitmap for free blocks.
Superblock and Group Descriptors
The superblock (ext4_super_block) records global filesystem metadata such as total inode count, total block count, inodes per group, and blocks per group.
Block groups are described by ext4_group_desc structures, which contain pointers to the inode bitmap, block bitmap, and inode table for that group.
Meta Block Groups split the block‑group descriptor table into smaller groups (typically 64 groups per meta group) to reduce memory usage and improve resilience.
Directory Storage
Directories are regular files with inodes. Their data blocks contain ext4_dir_entry records that map filenames to inode numbers. When the EXT4_INDEX_FL flag is set, a hashed B‑tree index speeds up lookups.
Linux File Caching
Ext4 defines its file operations in ext4_file_operations, which delegate reads to generic_file_read_iter and writes to __generic_file_write_iter.
const struct file_operations ext4_file_operations = {
.read_iter = ext4_file_read_iter,
.write_iter = ext4_file_write_iter,
};Read/write paths distinguish between cached I/O and direct I/O. Cached I/O first checks the page cache; if a page is missing, it is read from disk and cached. Direct I/O bypasses the page cache, writing straight to the device.
ssize_t generic_file_read_iter(struct kiocb *iocb, struct iov_iter *iter) {
if (iocb->ki_flags & IOCB_DIRECT) {
// direct I/O path
}
// buffered read path
}
ssize_t __generic_file_write_iter(struct kiocb *iocb, struct iov_iter *from) {
if (iocb->ki_flags & IOCB_DIRECT) {
// direct write path
} else {
// buffered write path
}
}Buffered writes use generic_perform_write, which calls the address‑space operations write_begin, copies data from user space, calls write_end, and finally invokes balance_dirty_pages_ratelimited to decide when dirty pages should be flushed.
ssize_t generic_perform_write(struct file *file, struct iov_iter *i, loff_t pos) {
struct address_space *mapping = file->f_mapping;
const struct address_space_operations *a_ops = mapping->a_ops;
do {
// write_begin, copy from user, write_end, update pos
balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
}The function balance_dirty_pages_ratelimited monitors the number of dirty pages and triggers background write‑back when thresholds are exceeded, or when the user calls sync, memory pressure occurs, or dirty pages age beyond a configured limit.
void balance_dirty_pages_ratelimited(struct address_space *mapping) {
if (current->nr_dirtied >= ratelimit)
balance_dirty_pages(mapping, wb, current->nr_dirtied);
}Thus, ext4 combines journaling modes (journal, ordered, writeback) with sophisticated caching and write‑back mechanisms to balance performance and data integrity.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
