Why IO Performance Bottlenecks Cripple Modern Systems and How to Overcome Them
This article explains how IO performance becomes the bottleneck in modern systems, detailing the layered IO architecture, file system and buffer cache roles, various IO models, key metrics such as IOPS and latency, and the impact of storage technologies like LVM, RAID, and SAN/NAS solutions.
IO System Layers
IO performance is critical; after optimizing CPU and memory, the bottleneck often moves to the database and finally to IO. The IO stack can be viewed as three layers: Disk (storage) , Volume Management (VM) , and File System . Disk is the raw space, VM partitions that space like fences, and the file system builds structures (buildings) on the partitions to store data.
File System – Data Placement
File System solves space‑management problems: it decides where and how data is stored and retrieved.
Buffer Cache handles buffering. For reads it caches frequently accessed data; for writes it buffers data so that many small writes can be combined into larger ones before hitting the disk.
IO Model
IO requests pass through two main phases. First, the waiting‑for‑resource phase where the request may block until the required device (disk, RAM, file) becomes available. Second, the resource‑use phase where the actual data transfer occurs.
During the waiting phase, IO can be blocking (the request stays blocked until data arrives or a timeout) or non‑blocking (the request returns immediately with a “resource not available” status). During the use phase, IO can be synchronous (the application blocks until the operation completes) or asynchronous (the request returns immediately and the OS completes the operation in the background).
Key Metrics
IOPS – number of IO operations per second; crucial for random‑access workloads such as OLTP databases.
IO Response Time – time from the kernel issuing an IO request to receiving the response; includes device seek time, rotation time, and transfer time.
Throughput – total amount of data transferred per unit time; more important for sequential or large‑block workloads.
IO Chunk Size – size of a single IO request; small chunks test IOPS capability, large chunks test throughput.
Queue Depth – number of IO requests that can be queued for a single device; higher depth can improve IOPS up to a point.
File System Structure (Linux Ext2/Ext3 Example)
The Linux VFS (Virtual File System) abstracts different concrete file systems and provides a uniform interface to the kernel. Core structures include:
Boot Block – stores the boot loader.
Super Block – global file‑system parameters (name, status, block size, total blocks).
Inode – metadata for each file (type, permissions, size, timestamps) and pointers to data blocks (direct, indirect, double‑indirect, triple‑indirect).
Directory – a special file mapping filenames to inode numbers.
Data Block – actual file content; size is a multiple of the disk block size (commonly 1 KB, 2 KB, 4 KB).
Buffer vs. Direct I/O
Buffers improve performance for sequential or large‑volume workloads (e.g., NFS, FTP) by caching data in memory. However, for random‑access workloads (e.g., databases, small files) buffers can become a penalty, and using Direct I/O (bypassing the buffer cache) is often preferable.
Volume Management (LVM)
LVM sits between the OS and physical disks, providing flexible disk management:
Physical Volume (PV) – a physical disk or LUN that LVM can manage.
Volume Group (VG) – a pool of one or more PVs.
Logical Volume (LV) – a virtual block device allocated from a VG; presented to the OS as a regular disk.
Physical Extent (PE) and Logical Extent (LE) – the smallest allocation units for PVs and LVs respectively.
LVM supports features such as dynamic expansion, striping (parallel access), mirroring (redundancy), and snapshots (point‑in‑time copies).
Striping, Mirroring, Snapshot
Striping distributes data across multiple disks to increase parallelism and throughput. Mirroring writes identical data to two disks, improving read speed and reliability at the cost of double storage. Snapshot captures the state of a volume at a specific moment, useful for backups.
RAID Levels
RAID combines multiple disks to improve performance and/or reliability:
RAID 0 – striping only; high performance, no redundancy.
RAID 1 – mirroring; redundancy with a 2× storage penalty.
RAID 5 – block‑level striping with distributed parity; balances capacity, performance, and fault tolerance.
RAID 6 – like RAID 5 but with two parity blocks; tolerates two simultaneous disk failures.
RAID 10 (striped mirrors) – combines RAID 0 and RAID 1; high performance and redundancy, requires at least four disks.
RAID 50 – striping across multiple RAID 5 sets; higher capacity and fault tolerance for large arrays.
Storage Types: DAS, SAN, NAS
DAS (Direct‑Attached Storage) connects disks directly to a host via interfaces such as PATA, SATA, or SAS. It offers low latency but no sharing capability.
SAN (Storage Area Network) provides block‑level access over a dedicated network (typically Fibre Channel or iSCSI). It enables multiple hosts to share storage and offers high performance and scalability.
NAS (Network‑Attached Storage) offers file‑level access using protocols like SMB or NFS. It operates at a higher abstraction layer, sharing files rather than raw blocks.
SAN over Ethernet
Traditional Fibre Channel SANs can be extended over IP networks using FCIP . Modern IP‑based SANs use iSCSI , which encapsulates SCSI commands in TCP/IP packets, allowing block storage over standard Ethernet.
SCSI Stack
SCSI defines a generic interface for block devices. The stack consists of a high‑level OS interface, a middle translation layer (protocol gateway), and a low‑level driver that talks directly to the hardware.
# ls -l /dev/*lv
brw------- 1 root system 22, 2 May 15 2007 lv
crw------- 2 root system 22, 2 May 15 2007 rlvConclusion
Understanding the IO stack, file‑system internals, and storage technologies (LVM, RAID, DAS/SAN/NAS) is essential for diagnosing performance bottlenecks and designing resilient, high‑throughput systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
