Unlocking Linux: Inside the Kernel, VFS, and File System Mechanics
This article provides a comprehensive overview of Linux internals, covering the kernel’s core components, memory and process management, the virtual file system layer, ext4 inode structures, caching strategies, direct I/O, and kernel parameter tuning for performance optimization.
1. Linux Kernel
The kernel is the core of the operating system, responsible for managing processes, memory, device drivers, the file system, and networking, which together form the basic OS structure.
1.1 Memory Management
Linux employs virtual memory, dividing physical memory into 4 KB pages. It uses a slab allocator to manage these pages, supports swapping pages to disk, and provides mechanisms for allocating and freeing memory efficiently.
1.2 Process Management
Processes are execution entities of applications. Linux schedules multiple processes using time slices, employs a priority‑based scheduler, and provides inter‑process communication mechanisms such as signals, pipes, shared memory, semaphores, and sockets.
1.3 File System
Unlike DOS, Linux does not use drive letters; instead it builds a single hierarchical tree by mounting individual file systems at directories. The Virtual File System (VFS) abstracts the underlying file systems, offering a uniform API (open, read, write, close) to user space.
VFS separates logical file system implementations from device drivers, allowing support for dozens of file systems (ext2, ext3, ext4, FAT, VFAT, NTFS, etc.).
1.4 Device Drivers
Device drivers run in kernel space with high privileges, providing abstract interfaces for hardware interaction. Errors in drivers can crash the entire system.
1.5 Network Interface (NET)
The network stack supports BSD sockets and the full TCP/IP suite, with protocol and driver layers handling communication.
2. Linux Shell
The shell is the user interface and command interpreter, translating user commands into kernel calls. Common shells include bash, sh, csh, ksh, and zsh.
3. Linux System Files
3.1 File System Concepts
File systems organize data on storage devices using structures such as inodes, directory entries, and block groups. They support formatting, mounting, and treat everything as a file.
3.2 Virtual File System (VFS)
VFS provides a common abstraction layer, defining required interfaces and data structures so that different file systems can be accessed uniformly via system calls.
3.3 Unix File System
Key abstractions are files, directory entries, inodes, and mount points. Inodes store metadata (permissions, owner, size, timestamps) while directory entries map names to inodes.
3.4 File System Characteristics
Strict organization allowing block‑level storage.
Index areas for locating file blocks.
Cache layers for hot files.
Directory‑based organization for easy management.
Kernel‑maintained structures tracking open files.
3.5 EXT Series Formats
Ext4 introduces extents, a tree‑structured representation of contiguous blocks, reducing fragmentation and improving performance for large files.
struct ext4_inode {
__le16 i_mode;
__le16 i_uid;
__le32 i_size_lo;
__le32 i_atime;
__le32 i_ctime;
__le32 i_mtime;
__le32 i_dtime;
__le16 i_gid;
__le16 i_links_count;
__le32 i_blocks_lo;
__le32 i_flags;
...
__le32 i_block[EXT4_N_BLOCKS];
__le32 i_generation;
__le32 i_file_acl_lo;
__le32 i_size_high;
...
};Block allocation constants:
#define EXT4_NDIR_BLOCKS 12
#define EXT4_IND_BLOCK EXT4_NDIR_BLOCKS
#define EXT4_DIND_BLOCK (EXT4_IND_BLOCK + 1)
#define EXT4_TIND_BLOCK (EXT4_DIND_BLOCK + 1)
#define EXT4_N_BLOCKS (EXT4_TIND_BLOCK + 1)Extent header and extent structures define the tree nodes used by ext4:
struct ext4_extent_header {
__le16 eh_magic;
__le16 eh_entries;
__le16 eh_max;
__le16 eh_depth;
__le32 eh_generation;
};
struct ext4_extent {
__le32 ee_block;
__le16 ee_len;
__le16 ee_start_hi;
__le32 ee_start_lo;
};
struct ext4_extent_idx {
__le32 ei_block;
__le32 ei_leaf_lo;
__le16 ei_leaf_hi;
__u16 ei_unused;
};3.6 Directory Storage Format
Directories are special files containing ext4_dir_entry records that map file names to inode numbers. When the EXT4_INDEX_FL flag is set, a hashed index tree speeds up lookups.
3.7 ext4 File System
Ext4 offers larger maximum file system size (1 EB) and file size (16 TB), journaling modes (journal, ordered, writeback), unlimited sub‑directories, built‑in encryption, compression, online checking, and defragmentation.
3.8 Btrfs File System
Btrfs provides transparent compression (zstd, lz4, zlib), copy‑on‑write, snapshots, RAID support, and scales up to 16 EB.
3.9 XFS File System
XFS is a high‑performance journaling file system supporting up to 16 EB, online resizing, delayed allocation, and efficient handling of large files.
4. Linux Page Cache
4.1 ext4 File Operations
const struct file_operations ext4_file_operations = {
...
.read_iter = ext4_file_read_iter,
.write_iter = ext4_file_write_iter,
...
};Read path calls generic_file_read_iter; write path calls __generic_file_write_iter, which distinguishes cached I/O from direct I/O.
4.2 Cached Write Path
ssize_t generic_perform_write(struct file *file,
struct iov_iter *i, loff_t pos)
{
struct address_space *mapping = file->f_mapping;
const struct address_space_operations *a_ops = mapping->a_ops;
do {
struct page *page;
unsigned long offset, bytes;
status = a_ops->write_begin(file, mapping, pos, bytes, flags,
&page, &fsdata);
copied = iov_iter_copy_from_user_atomic(page, i, offset, bytes);
flush_dcache_page(page);
status = a_ops->write_end(file, mapping, pos, bytes, copied,
page, fsdata);
pos += copied;
written += copied;
balance_dirty_pages_ratelimited(mapping);
} while (iov_iter_count(i));
return written;
}The write_begin step handles journaling (journal, ordered, writeback) and obtains a cache page via grab_cache_page_write_begin. Data is copied from user space with iov_iter_copy_from_user_atomic, then write_end marks the page dirty. balance_dirty_pages_ratelimited triggers background writeback when dirty pages exceed thresholds.
4.3 Cached Read Path
static ssize_t generic_file_buffered_read(struct kiocb *iocb,
struct iov_iter *iter,
ssize_t written)
{
struct file *filp = iocb->ki_filp;
struct address_space *mapping = filp->f_mapping;
for (;;) {
struct page *page = find_get_page(mapping, index);
if (!page) {
if (iocb->ki_flags & IOCB_NOWAIT)
goto would_block;
page_cache_sync_readahead(mapping, ra, filp, index,
last_index - index);
page = find_get_page(mapping, index);
if (unlikely(page == NULL))
goto no_cached_page;
}
if (PageReadahead(page))
page_cache_async_readahead(mapping, ra, filp, page,
index, last_index - index);
ret = copy_page_to_iter(page, offset, nr, iter);
}
return ret;
}The function first looks for a cached page; if missing, it performs synchronous readahead, then possibly asynchronous readahead, and finally copies data to user space.
5. Kernel Parameter Tuning
Kernel parameters are exposed via the /proc filesystem, allowing runtime adjustments to optimize performance, such as tuning dirty‑page thresholds, I/O scheduler settings, and memory management knobs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
