Fundamentals 11 min read

Why cp Can Copy a 100 GB File Instantly: Sparse Files and File System Mechanics

This article explains why the Linux cp command appears to copy a 100 GB file in less than a second by exploring sparse files, the distinction between file size and allocated blocks, inode structure, multi‑level block indexing, and how these concepts enable fast copying of seemingly huge files.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Why cp Can Copy a 100 GB File Instantly: Sparse Files and File System Mechanics

A colleague was surprised when copying a 100 GB file with cp finished in under a second. The file appeared to be 100 GB according to # ls -lh : # ls -lh -rw-r--r-- 1 root root 100G Mar 6 12:22 test.txt

However, using du -sh ./test.txt showed only 2 MB of actual disk usage: # du -sh ./test.txt 2.0M ./test.txt

The stat command revealed the file's logical size (107374182400 bytes) and that it occupies only 4096 blocks (2 MB) on disk: # stat ./test.txt File: ./test.txt Size: 107374182400 Blocks: 4096 IO Block: 4096 regular file ...

This discrepancy is due to the file being a sparse file . In a sparse file, the logical size recorded in the inode can be much larger than the physical blocks actually allocated; unwritten regions do not consume disk space.

File systems store data in fixed‑size blocks and keep metadata in an inode. The inode contains a list of block pointers: direct pointers (first 12 entries), single‑indirect, double‑indirect, and triple‑indirect pointers. This multi‑level indexing allows the file system to address very large logical files while only allocating blocks for regions that contain data.

When cp copies a sparse file, it copies the inode metadata and the allocated blocks without writing the empty regions, so the operation completes quickly regardless of the file's logical size.

Key points: File Size is the logical length visible to users. Blocks indicate the actual disk space used. Sparse files have a large logical size but few allocated blocks. Inodes and multi‑level block indexing (direct, indirect, double, triple) enable efficient storage and retrieval of such files.

Understanding these mechanisms clarifies why copying a seemingly massive file can be instantaneous and highlights the importance of distinguishing logical size from physical storage usage.

LinuxInodeFilesystemCPblock indexingsparse file
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.