Fundamentals 11 min read

Understanding Sparse Files and Inode Block Indexing in Linux File Systems

The article explains why a 100 GB file can be copied in under a second by examining the difference between logical file size and physical block usage, demonstrating sparse file behavior, inode structure, direct and indirect block indexing, and how these mechanisms affect copy performance on Linux.

Top Architect
Top Architect
Top Architect
Understanding Sparse Files and Inode Block Indexing in Linux File Systems

When a colleague used cp to copy a 100 GB file and the operation finished in less than a second, the unexpected speed prompted an investigation into the file system.

Running ls -lh showed the file size as 100 GB, while du -sh ./test.txt reported only 2 MB, indicating a discrepancy between logical size and actual disk usage.

The stat ./test.txt output revealed a Size of 107374182400 bytes (100 GB) but only 4096 blocks (2 MB) allocated, illustrating that Size reflects the logical length while Blocks represent the physical space actually occupied.

This difference is due to the file being a sparse file : the filesystem records the logical length in the inode, but only allocates physical blocks for regions that contain data. Unwritten regions do not consume disk space.

File systems manage storage by dividing the disk into fixed‑size blocks (typically 4 KB) and using an inode to store metadata and a list of block pointers. The inode contains 15 pointers: the first 12 are direct pointers, the 13th is a single‑indirect pointer, the 14th a double‑indirect pointer, and the 15th a triple‑indirect pointer.

Direct pointers can address up to 12 × 4 KB = 48 KB. A single‑indirect block can reference 1024 blocks, covering up to 4 MB. Double‑indirect adds another level, reaching 4 GB, and triple‑indirect extends the addressable space to roughly 4 TB, which is the maximum file size supported by classic ext2‑style inode structures.

When copying a sparse file, cp simply replicates the inode metadata and the allocated blocks; it does not need to read or write the empty regions, so the operation completes extremely quickly.

The article also outlines the write and read processes: creating a file allocates an inode, writing data stores blocks and updates the inode’s block list, and reading retrieves the inode first, then follows the block pointers to reconstruct the file.

In summary, the fast cp result is explained by the sparse nature of the file, the distinction between logical size and physical block allocation, and the hierarchical inode indexing scheme that efficiently manages large files without allocating unnecessary disk space.

linuxfile systemInodeblock indexingsparse filecp command
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.