Fundamentals 20 min read

How Linux File Systems and Disk I/O Work

The article explains Linux's core storage components—inode, dentry, superblock, and logical blocks—how the Virtual File System abstracts different file systems, the classification of file systems and I/O types, disk technologies, the block layer, I/O schedulers, and practical performance metrics and monitoring tools.

Linux Tech Enthusiast
Linux Tech Enthusiast
Linux Tech Enthusiast
How Linux File Systems and Disk I/O Work

Core File System Structures

Linux manages all resources through a unified file system. Each file has an inode that stores metadata (inode number, size, permissions, timestamps, data locations) and a dentry that stores the filename, a pointer to its inode, and relationships to other directory entries. The inode uniquely identifies a file, while multiple dentries can point to the same inode (hard links).

Data is stored in logical blocks, typically 4 KB (eight 512 B sectors), to avoid the inefficiency of sector‑size I/O.

During formatting, a disk is divided into three regions: the superblock (filesystem state), the inode area, and the data block area.

Virtual File System (VFS)

VFS provides a uniform interface for user space and kernel subsystems, abstracting the details of underlying concrete file systems (e.g., EXT4, XFS, OverlayFS, procfs, sysfs, NFS, SMB). The VFS layer defines common data structures and operations, allowing the kernel to interact with any supported filesystem without knowing its specifics.

File systems are categorized as:

Disk‑based (e.g., EXT4, XFS)

Memory‑based (e.g., procfs, sysfs)

Network‑based (e.g., NFS, SMB, iSCSI)

File System I/O Classification

Four dimensions are used to classify I/O:

Buffered vs. Direct I/O : Buffered I/O uses the standard library cache; Direct I/O bypasses it and goes straight to system calls.

Direct vs. Indirect I/O : Direct I/O sets O_DIRECT to avoid page cache; indirect I/O uses the page cache.

Blocking vs. Non‑blocking I/O : Blocking I/O stalls the calling thread until completion; non‑blocking I/O returns immediately and notifies later (e.g., O_NONBLOCK).

Synchronous vs. Asynchronous I/O : Synchronous I/O waits for the operation to finish; asynchronous I/O returns immediately and notifies via events (e.g., O_ASYNC).

Disk Types and Characteristics

Mechanical disks (HDD) store data on rotating platters; the smallest unit is a 512 B sector. Sequential I/O is fast, while random I/O suffers from seek and rotation latency.

Solid‑state drives (SSD) use flash cells; the smallest unit is a page (4 KB‑8 KB). SSDs outperform HDDs for both sequential and random I/O, though random writes incur erase‑and‑write overhead and garbage collection.

General Block Layer

The block layer sits between file systems and device drivers, providing a uniform block device abstraction. It queues I/O requests, merges them, and forwards them to the device layer. Linux supports four I/O schedulers:

NONE : No scheduling, used mainly in virtual machines.

NOOP : Simple FIFO queue with minimal merging, suitable for SSDs.

CFQ : Completely Fair Queuing, default on many distributions, allocates time slices per process and supports priority.

Deadline : Creates separate read/write queues, prioritizing requests with approaching deadlines, useful for high‑load workloads such as databases.

Linux I/O Stack

The I/O stack consists of three layers (top‑down):

File system layer (including VFS and concrete file systems)

General block layer (request queue and scheduler)

Device layer (hardware drivers and actual disks)

Multiple caching mechanisms—page cache, inode cache, dentry cache, and block device buffers—reduce the latency of the inherently slow storage subsystem.

Disk Performance Metrics and Observation

Key metrics are utilization, saturation, IOPS, throughput, and response time. Utilization (>80 %) often indicates a bottleneck; saturation shows how busy the disk is; IOPS matters for random‑heavy workloads; throughput matters for sequential workloads.

Common tools: df and df -i to view filesystem and inode usage. cat /proc/slabinfo (or slabtop) to inspect inode and dentry caches. iostat -d -x 1 to display per‑device utilization, IOPS, throughput, and average wait times.

For per‑process I/O, pidstat and iotop are recommended.

Interpreting iostat output: %util is disk usage, r/s + w/s gives IOPS, rkB/s + wkB/s gives throughput, and r_await + w_await gives average response time. Comparing these values against baseline benchmarks (e.g., using fio) helps assess whether the disk is the performance limiter.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance MonitoringLinuxfile systeminodedisk I/OVFSI/O scheduler
Linux Tech Enthusiast
Written by

Linux Tech Enthusiast

Focused on sharing practical Linux technology content, covering Linux fundamentals, applications, tools, as well as databases, operating systems, network security, and other technical knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.