Why Does cp Copy a 100 GB Sparse File Instantly? Understanding Inodes and Block Indexing
A colleague copied a 100 GB file with the cp command and it finished in under a second, prompting an investigation that reveals the difference between a file's logical size and its physical block usage, the role of inodes, direct and indirect block indexing, and how sparse files make such rapid copies possible.
A colleague used cp to copy a file that appeared to be 100 GB in size, yet the operation completed in less than a second. The file's listing ( ls -lh) showed a 100 GB size, while du -sh reported only 2 MB, and stat displayed a size of 107374182400 bytes with 4096 blocks (2 MB).
File Size vs. Physical Blocks
The Size field in stat reflects the logical file size that applications see, whereas the Blocks field indicates the actual disk space allocated (each block is 512 bytes). In this case the file occupies only 2 MB of physical space.
How a File System Works
A file system is essentially a container for digital data, analogous to a luggage storage service: the file name is the label, the inode is the index card, the file content is the luggage, and the disk is the storage room. The inode stores metadata and a list of blocks that hold the file’s data.
Inode Structure and Multi‑Level Indexing
An inode contains a Size attribute and a Block array . The array has 15 entries: the first 12 are direct indexes (each points directly to a data block), entry 13 is a single‑indirect index , entry 14 a double‑indirect index , and entry 15 a triple‑indirect index . With a typical 4 KB block size, the addressing capacity is:
Direct indexes: 12 × 4 KB ≈ 48 KB
Single‑indirect: 1024 × 4 KB ≈ 4 MB
Double‑indirect: 1024² × 4 KB ≈ 4 GB
Triple‑indirect: 1024³ × 4 KB ≈ 4 TB
Thus a filesystem like ext2 can support files up to roughly 4 TB using this hierarchical indexing scheme.
Performance Implications
For files that fit within the direct indexes, only two disk reads are needed (inode + data block). Large files may require up to five reads: inode, single‑indirect block, double‑indirect block, triple‑indirect block, and the data block.
Why cp Is So Fast: Sparse Files
The examined file is a sparse file : its logical size is 1 TB + 4 KB, but only two 4 KB regions contain actual data, totaling 8 KB. The filesystem allocates blocks only for the regions that hold data; the empty “holes” consume no physical space. When cp copies such a file, it copies only the allocated blocks, so the operation finishes almost instantly.
Key Takeaways
File size shown by stat is a logical attribute; physical space usage depends on allocated blocks.
Inodes store metadata and block pointers, using direct and multi‑level indirect indexing to address large files efficiently.
Sparse files separate logical size from physical allocation, enabling rapid copying with cp because only the real data blocks are processed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
