Why Does cp Copy a 100 GB Sparse File in Under a Second?
This article explains why copying a seemingly 100 GB file with the cp command finishes almost instantly by exploring sparse files, the difference between file size and physical block usage, inode structures, multi‑level block indexing, and how modern file systems efficiently manage storage.
Analyzing the File
A colleague used cp to copy a 100 GB file and was surprised that the operation completed in less than a second.
Verification with ls -lh showed the file size was indeed 100 GB:
sh-4.4# ls -lh
-rw-r--r-- 1 root root 100G Mar 6 12:22 test.txtTiming the copy with time cp ./test.txt ./test.txt.cp produced:
real 0m0.107s
user 0m0.008s
sys 0m0.085sOn a typical SATA HDD with a maximum write speed of about 150 MB/s, copying 100 GB should take around 11 minutes, yet the observed copy took far less time.
Further inspection with du -sh ./test.txt reported only 2 MB of actual data:
sh-4.4# du -sh ./test.txt
2.0M ./test.txtThe stat command revealed the file's metadata:
sh-4.4# stat ./test.txt
File: ./test.txt
Size: 107374182400 Blocks: 4096 IO Block: 4096 regular file
Device: 78h/120d Inode: 3148347 Links: 1
Access: (0644/-rw-r--r--) Uid: (0/ root) Gid: (0/ root)
Access: 2021-03-13 12:22:00.888871000 +0000
Modify: 2021-03-13 12:22:46.562243000 +0000
Change: 2021-03-13 12:22:46.562243000 +0000
Birth: -Key points from the stat output:
Size represents the logical file size (what users see).
Blocks indicates the actual physical space used (each block is 512 bytes, so 4096 blocks equal 2 MB).
File System Basics
A file system is essentially a container for storing data, analogous to a luggage storage service where files are the luggage, the disk is the storage room, and metadata (inode) acts as the identification tag.
When storing data, the system records the file name, creates an index (inode), and allocates fixed‑size blocks (typically 4 KB) on the disk.
Space Management
To manage large files, the file system divides the disk into discrete blocks and uses an inode to keep track of which blocks belong to a file. Direct indexing stores up to 12 block numbers (≈48 KB). Single, double, and triple indirect indexing extend this capability to megabytes, gigabytes, and terabytes respectively.
For example, in an ext2‑like file system:
Direct index: 12 blocks → up to 48 KB.
Single indirect: 1024 block pointers → up to 4 MB.
Double indirect: 1024 × 1024 pointers → up to 4 GB.
Triple indirect: 1024³ pointers → up to 4 TB.
Why cp Is So Fast?
The observed file is a sparse file: its logical size is 1 TB + 4 KB, but only two 4 KB regions contain actual data, totaling 8 KB. Unwritten regions do not allocate physical blocks, so the file occupies only 8 KB on disk.
When cp copies such a file, it merely replicates the metadata and the allocated blocks, completing almost instantly.
Summary
The key insights are:
File size (inode attribute) reflects logical size, not physical space.
Physical space is determined by the number of allocated blocks; unwritten holes consume no space.
Sparse files exploit this by having large logical sizes with minimal actual storage, making operations like cp appear extremely fast.
Understanding block allocation, inode indexing, and sparse file behavior is essential for grasping how modern file systems efficiently manage storage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
