Understanding Linux File Systems: Inodes, VFS, and Storage Strategies
This article explains the core components of Linux file systems—including inodes, directory entries, virtual file system layers, storage allocation methods, free‑space management, block‑group layout, directory handling, hard and symbolic links, and various I/O models—providing a comprehensive guide for developers and system engineers.
Overview
The Linux file system is the OS subsystem that manages persistent data on disk. It provides a uniform interface for all file‑related objects (regular files, directories, block devices, pipes, sockets) and guarantees that data survive power loss.
Core Data Structures
Inode (index node) : stores metadata such as inode number, size, permissions, timestamps and the locations of the file’s data blocks. Each inode uniquely identifies a file and occupies space on disk.
Directory entry (dentry) : maps a file name to an inode number. Dentries are cached in kernel memory; the on‑disk directory file contains a list of these entries.
Are directory entries and directories the same?
No. A directory is a special file stored on disk; a directory entry is an in‑memory structure that caches the name‑to‑inode mapping.
Virtual File System (VFS)
VFS sits between user space and concrete file‑system implementations. It defines common data structures and operations so that applications can use a single API regardless of the underlying file system (ext2/3/4, XFS, NFS, etc.).
Allocation Strategies
Continuous Allocation
All data blocks of a file occupy a contiguous region. This yields high sequential throughput because a single seek reads the whole file. The file header must store the start block and length. Drawbacks include fragmentation and difficulty extending the file.
Linked Allocation
Each block contains a pointer to the next block. Two variants exist:
Implicit linked list : the file header stores the first and last block numbers; each block points to its successor.
Explicit linked list (FAT) : a global File Allocation Table maps every block to its next block, enabling fast look‑ups at the cost of large memory usage on big disks.
Indexed Allocation
An index block holds a list of pointers to the file’s data blocks, similar to a book’s table of contents. The file header points to the index block, which is updated as blocks are allocated. Indexed allocation simplifies random access and supports dynamic growth, though it adds overhead for small files.
Combined Schemes
Large files may use multi‑level indexing (e.g., Ext2’s 13‑pointer inode scheme) or chain index blocks together to overcome the limit of a single index block.
Free Space Management
Linux tracks unused blocks using three common techniques:
Free‑table method : a table of free region start block and length (efficient when free areas are few).
Free‑list method : each free block stores a pointer to the next free block, forming a linked list.
Bitmap method : one bit per block (0 = free, 1 = allocated). Linux uses bitmaps for both data blocks and inodes.
1111110011111110001110110111111100111 ...Ext2 File System Layout
Linux groups blocks into block groups . Each group contains:
Superblock – global file‑system metadata (block count, block size, etc.).
Group descriptor – per‑group status (free blocks, free inodes).
Data‑block bitmap.
Inode bitmap.
Inode table.
Data blocks.
Redundant copies of the superblock and group descriptor are stored in multiple groups to improve reliability and locality.
Directory Storage
A directory is a special file whose blocks store a list of entries (name, inode, type). Modern ext filesystems store directories as hash tables to accelerate look‑ups; the entries “.” and “..” represent the current and parent directories.
Hard Links and Symbolic Links
Hard links create additional directory entries that point to the same inode. They cannot cross file‑system boundaries; the file is removed only when all hard links are deleted.
Symbolic (soft) links are separate files with their own inode that store the target path. They can span file‑system boundaries and remain after the target is removed (becoming dangling).
File I/O Models
Linux I/O can be classified along three axes:
Buffered vs. unbuffered I/O : standard library buffering vs. raw system calls.
Direct (O_DIRECT) vs. indirect I/O : bypasses the kernel page cache or uses it.
Blocking vs. non‑blocking I/O : read/write may block until data is ready or return immediately.
Synchronous vs. asynchronous I/O : aio_read / aio_write return instantly and the kernel copies data later.
Typical file‑operation sequence:
fd = open(name, flags); // open file
...
write(fd, ...); // write data
...
close(fd); // close fileWhen does the kernel flush buffered writes to disk?
Flushing occurs when the page cache is full, on an explicit sync call, under memory pressure, or after a timeout.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
