XFS Deep Dive: Layout, Inode Management, and Read/Write Operations
This article analyzes the XFS filesystem implementation in the Linux kernel, covering its on‑disk layout, superblock and allocation‑group structures, inode and free‑space B+ trees, operation sets (iops, fops, aops), file creation, write and read paths, logging, block layer interactions, and useful XFS utilities.
1. XFS Layout
The XFS layout follows the official XFS layout documentation. Each allocation group (AG) starts with a superblock; the primary superblock of AG0 stores all AG information, while secondary superblocks are used only when the primary is damaged. The second sector (AGF) manages free space using two B+ trees indexed by block number and block count. The third sector (AGI) stores inode information, allocating inodes in chunks of 64. The fourth sector (AGFL) holds a free‑list of block pointers reserved for internal use.
2. XFS Operations
XFS defines three operation sets: iops for inode metadata (size, timestamps, etc.), fops for file data read/write at the page‑cache level, and aops for direct disk‑level I/O. These are illustrated through
open,
read, and
writecalls.
3. Inode Operations and File Creation
Inode operations handle inode attribute updates, creation, lookup, linking, and file creation. The directory inode uses
dir_inode->i_ops = xfs_dir_ci_inode_operations / xfs_dir_inode_operations(found in
fs/xfs/xfs_iops.c) to set ACLs and assign the iops/fops/aops sets.
File creation checks quota availability, reserves blocks and log space, then allocates an inode via the following steps:
xfs_trans_alloc: reserve blocks and log space for a transaction.
xfs_trans_reserve_quota: reserve quota based on user/group/project limits.
xfs_dir_ialloc: allocate the inode.
xfs_trans_commit: commit the transaction to the log.
Disk‑level inode allocation involves reading the AGI structure with
xfs_ialloc_read_agi, locating the inode B+‑tree root, and using
xfs_ialloc_ag_allocto allocate a new inode, handling both contiguous and spare allocations via
xfs_inobt_insertand
xfs_inobt_insert_sprec.
4. XFS Write Operations
Memory‑level writes use
struct file_operations xfs_file_operations, while disk‑level writes use
struct address_space_operations xfs_address_space_operations. Data is first written to the page cache; when dirty pages accumulate, they are flushed to disk via
sync/fsyncor background writeback. XFS supports three write paths: buffer I/O (with page cache), direct I/O (bypassing cache), and DAX write (for persistent memory devices).
XFS manages file extents with B+‑tree structures; each extent records logical file offset, physical block address, length, and status. The mapping is built by
xfs_iread_extentsand
xfs_iext_lookup_extent, and actual I/O is performed by the iomap module via
iomap_write_actor.
5. XFS Read Operations
Read operations start with
xfs_file_iomap_begin, which obtains the iomap for the current file position. The physical block address is calculated as
iomap->addr + pos - iomap->offset.
6. Xlog Mechanism
XFS uses a journaling log (xlog) to ensure consistency; log records are written before data blocks are updated.
7. Block Layer
The block layer handles I/O submission and completion. After a device finishes an I/O, a hardware interrupt triggers
do_IRQ, which may raise a soft IRQ to process the request completion (e.g.,
blk_mq_requeue_work).
8. XFS Tools
Common XFS utilities include:
mkfs.xfs: format a device with XFS.
xfs_fsr: defragment a mounted XFS filesystem.
xfs_bmap: display block mapping of a file.
xfs_info: show filesystem information.
xfs_admin: modify filesystem parameters (requires unmounted filesystem).
xfs_copy: parallel copy of an XFS filesystem.
xfs_metadump/
xfs_mdrestore: dump and restore filesystem metadata.
xfs_db: interactive debugging tool for XFS structures.
These commands allow administrators to create, inspect, maintain, and troubleshoot XFS filesystems.
Tencent Architect
We share technical insights on storage, computing, and access, and explore industry-leading product technologies together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.