How Do Disks Work? A Deep Dive into HDD, SSD, and Ext4 Filesystem Architecture
This article explains the physical and logical operation of mechanical hard drives and solid‑state drives, details the Ext4 filesystem structures such as superblocks, inodes, block groups, flexible and meta block groups, and outlines allocation strategies and link types.
1 Disk How It Works?
Mechanical disks, also called Hard Disk Drives (HDD), consist of multiple platters with data stored on both sides. Tracks are concentric circles, heads read/write data, and cylinders are groups of tracks across platters. Outer cylinders have higher throughput due to larger linear speed. Sectors are 512‑byte units, with sector 0 on cylinder 0 head 0 being the first read at boot.
Solid‑State Drives (SSD) use electronic components, lack moving parts, and provide superior performance for both sequential and random I/O compared to mechanical disks.
2 Overview of Disk Logical Structure
The smallest read/write unit is a sector, but operating on such small units is inefficient. Filesystems like Ext group sectors into logical blocks (commonly 4 KB, i.e., eight contiguous sectors). The first block group reserves 1 KB for the boot area on bootable media. During formatting, the disk is divided into three regions: the superblock, inode table, and data blocks.
Inode : Stores file metadata (inode number, size, permissions, timestamps, data locations). Each file has a corresponding inode, which occupies disk space.
Data Block : Holds the actual file data. Directory blocks contain entries mapping filenames to inodes, with “.” for the current directory and “..” for the parent.
Superblock : Core structure storing filesystem metadata such as free/used block counts, block size, filesystem state, timestamps, and a magic number. Typically only the superblock in block group 0 is used; backups exist in other groups.
Group Descriptor Table (GDT) : Records the status of each block group, including free block and inode counts, and is backed up similarly to the superblock.
Inode Bitmap : Binary map indicating which inodes are free or allocated.
Data Block Bitmap : Binary map indicating which data blocks are free or allocated; each block group can contain up to 32 768 logical blocks (4 KB each), i.e., up to 128 MB.
Inode List : Contains all inodes within a block group.
Important Data Backup
By default, both the superblock and GDT have copies in every block group. Enabling sparse_super stores these copies only in block groups whose indices are powers of 2 (e.g., 0, 3, 5, 7, 9, 25, 27). While extra superblock copies are cheap, duplicating the GDT in every group wastes space; each block group is limited to 128 MB, and the GDT size (64 bytes) caps the total number of block groups, limiting the maximum filesystem size to about 256 TB for Ext4.
Flexible Block Groups
Ext4 introduces flexible block groups (flex_groups), which combine several consecutive block groups into a larger logical group. The first physical block group in a flex_group stores the superblock, GDT, block bitmaps, inode bitmaps, and inode tables for all groups in the flex_group, leaving the remaining space for data blocks.
Aggregates metadata to speed up loading.
Keeps large files as contiguous as possible.
Even with flex_bg enabled, superblock and GDT backups remain at the start of each block group; the number of groups per flex is defined by
2^ext4_super_block.s_log_groups_per_flex, reducing seek time and allowing larger allocation requests.
Ext4 Introduces Meta Block Groups
Instead of a single descriptor table for all block groups, Ext4 splits them into Meta Block Groups, each containing descriptors for only its own 64 block groups. This reduces the size of each descriptor table and the amount of data that must be backed up.
<code>struct ext4_super_block {
...
__le32 s_blocks_count_lo; /* Blocks count */
__le32 s_r_blocks_count_lo;/* Reserved blocks count */
__le32 s_free_blocks_count_lo;/* Free blocks count */
...
__le32 s_blocks_count_hi; /* Blocks count */
__le32 s_r_blocks_count_hi;/* Reserved blocks count */
__le32 s_free_blocks_count_hi;/* Free blocks count */
...
}</code>3 Data Block and Inode Allocation Strategies
On mechanical disks, keeping related data blocks close reduces head movement and speeds up I/O. On SSDs, data locality increases the amount of data transferred per request, reducing the number of I/O operations. Ext4 employs several strategies to maintain locality and minimize fragmentation:
Multi‑block allocation: predicts and allocates up to 8 KB (several blocks) when a file is created, allowing data to be written as a contiguous extent.
Delayed allocation: postpones block placement decisions until dirty buffers are flushed to disk.
Prefer placing a file’s data blocks and its inode in the same block group to reduce seek time.
Prefer placing inodes of files within the same directory in the same block group, assuming directory contents are related.
When creating a directory in the root, the inode allocator selects the least‑used block group, distributing directories across the disk.
If fragmentation occurs, the
e4defragtool can be used to defragment files.
4 Hard Links and Soft Links
Hard Links :
Multiple filenames point to the same inode (identical inode numbers).
Provide alternative names for the same file.
Cannot be created for directories.
Cannot span across different filesystems/partitions.
Each additional hard link increments the inode’s link count.
Soft Links (Symbolic Links) :
Use a separate inode; the link’s inode number differs from the target’s.
The link’s data is the pathname of the target file (a string), and its size equals the string length.
Can be created for directories.
Can cross filesystem/partition boundaries.
Creating a soft link does not increase the target inode’s link count.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.