Fundamentals 11 min read

Understanding Sparse Files and Multi‑Level Inode Indexing in Linux File Systems

The article explains why copying a seemingly 100 GB file with the cp command finishes instantly by analyzing file size versus allocated blocks, sparse file concepts, inode structures, direct and indirect block indexing, and how Linux file systems manage storage space efficiently.

Top Architect
Top Architect
Top Architect
Understanding Sparse Files and Multi‑Level Inode Indexing in Linux File Systems

A colleague copied a file that appeared to be 100 GB in size, yet the time cp ./test.txt ./test.txt.cp command completed in less than a second, prompting an investigation.

Using ls -lh confirmed the file size as 100 GB, but du -sh ./test.txt reported only 2 MB of actual disk usage. The stat ./test.txt output showed Size = 107374182400 bytes (100 GB) and Blocks = 4096 (2 MB), illustrating the difference between logical size and physical allocation.

The article then introduces the file system as a storage container, comparing it to a luggage storage service: file names are recorded, metadata (like an ID tag) points to the data, and the physical disk provides the space.

Space management is discussed: storing a large file as a single contiguous block is impractical, so the file system divides the disk into fixed‑size blocks (typically 4 KB). An inode stores metadata and an array of block pointers. Direct pointers handle small files, while single, double, and triple indirect pointers allow the system to address larger files.

For example, with 12 direct pointers (48 KB), one single‑indirect block (4 MB), one double‑indirect block (4 GB), and one triple‑indirect block (4 TB), the maximum file size supported by this scheme is roughly 4 TB.

A sparse file is then described: its logical size (as reported by stat ) can be huge, but only the blocks that actually contain data are allocated on disk. In the example, a file reports a size of 1 TB + 4 KB but occupies only 8 KB physically because the middle region contains no data.

When copying such a sparse file, cp quickly creates a new file with the same logical size but only allocates the necessary data blocks, which explains the near‑instant copy time.

The article concludes that the key to this efficiency is the block‑based allocation, inode indexing, and delayed allocation of physical blocks, allowing Linux file systems to handle large logical files without consuming proportional disk space.

linuxstoragefile systemInodesparse filecp command
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.