Why cp Copies a 100GB File Instantly: Sparse Files & Inode Indexing Explained
A colleague was amazed when the cp command copied a 100 GB file in less than a second, prompting an investigation that reveals the difference between logical file size and physical block usage, the role of inodes, direct and indirect block indexing, and how sparse files make such copies appear instantaneous.
Why cp Appears So Fast
A colleague used cp to copy a 100 GB file and was surprised that the operation finished in under a second. ls -lh showed the file as 100 GB, but du -sh reported only 2 MB of actual disk usage.
sh-4.4# time cp ./test.txt ./test.txt.cp
real 0m0.107s
user 0m0.008s
sys 0m0.085sA typical SATA hard drive writes at about 150 MB/s, so copying 100 GB should take around 11 minutes. The discrepancy led to deeper analysis.
Analyzing the File with stat
File: ./test.txt
Size: 107374182400 Blocks: 4096 IO Block: 4096 regular file
Device: 78h/120d Inode: 3148347 Links: 1
Access: (0644/-rw-r--r--) Uid: (0/ root) Gid: (0/ root)
Access: 2021-03-13 12:22:00.888871000 +0000
Modify: 2021-03-13 12:22:46.562243000 +0000
Change: 2021-03-13 12:22:46.562243000 +0000
Birth: -The Size field (107374182400 bytes) reflects the logical file size, while the Blocks field (4096 × 512 B = 2 MB) shows the actual physical space allocated.
Key Points
Size is the logical size most users see.
Blocks represent the real disk space occupied.
File System Analogy
Think of a file system as a luggage storage service: you register a name (file name), receive a tag (metadata/index), and the storage room (disk) holds the physical items. The tag lets staff locate the luggage, just as an inode maps a file name to its data blocks.
Space Management in a File System
Storing data as a single contiguous chunk wastes space when files are sparse. Instead, the disk is divided into fixed‑size blocks (commonly 4 KB). A file’s inode contains pointers to the blocks that actually hold data.
Inode Structure and Multi‑Level Indexing
An inode typically holds 15 pointers:
First 12 pointers: direct indexes – each points directly to a data block (up to 48 KB).
13th pointer: single indirect – points to a block that contains further block numbers (adds up to 4 MB).
14th pointer: double indirect – adds another level, reaching about 4 GB.
15th pointer: triple indirect – adds a third level, reaching roughly 4 TB.
Thus a file system like ext2 can address up to ~4 TB using this hierarchy.
Why Sparse Files Copy Quickly
A sparse file has a large logical size but only a few blocks actually allocated. When cp copies such a file, it only reads and writes the allocated blocks, so the operation finishes rapidly despite the huge reported size.
In the example, the file’s logical size is 1 TB + 4 KB, but only two 4 KB blocks contain data, so the physical usage is merely 8 KB.
Conclusion
The speed of cp on seemingly huge files is explained by the distinction between logical size and physical block allocation, the inode’s role in mapping data, and the use of sparse files that allocate space only where data exists.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
