Fundamentals 12 min read

Why Does cp Copy a 100 GB Sparse File in Under a Second?

This article explains why copying a seemingly 100 GB file with the cp command finishes almost instantly by exploring sparse files, the difference between file size and physical block usage, inode structures, multi‑level block indexing, and how modern file systems efficiently manage storage.

Programmer DD
Programmer DD
Programmer DD
Why Does cp Copy a 100 GB Sparse File in Under a Second?

Analyzing the File

A colleague used cp to copy a 100 GB file and was surprised that the operation completed in less than a second.

Verification with ls -lh showed the file size was indeed 100 GB:

sh-4.4# ls -lh
-rw-r--r-- 1 root root 100G Mar  6 12:22 test.txt

Timing the copy with time cp ./test.txt ./test.txt.cp produced:

real 0m0.107s
user 0m0.008s
sys 0m0.085s

On a typical SATA HDD with a maximum write speed of about 150 MB/s, copying 100 GB should take around 11 minutes, yet the observed copy took far less time.

Further inspection with du -sh ./test.txt reported only 2 MB of actual data:

sh-4.4# du -sh ./test.txt
2.0M ./test.txt

The stat command revealed the file's metadata:

sh-4.4# stat ./test.txt
  File: ./test.txt
  Size: 107374182400   Blocks: 4096   IO Block: 4096   regular file
  Device: 78h/120d   Inode: 3148347   Links: 1
  Access: (0644/-rw-r--r--)  Uid: (0/ root)   Gid: (0/ root)
  Access: 2021-03-13 12:22:00.888871000 +0000
  Modify: 2021-03-13 12:22:46.562243000 +0000
  Change: 2021-03-13 12:22:46.562243000 +0000
  Birth: -

Key points from the stat output:

Size represents the logical file size (what users see).

Blocks indicates the actual physical space used (each block is 512 bytes, so 4096 blocks equal 2 MB).

File System Basics

A file system is essentially a container for storing data, analogous to a luggage storage service where files are the luggage, the disk is the storage room, and metadata (inode) acts as the identification tag.

When storing data, the system records the file name, creates an index (inode), and allocates fixed‑size blocks (typically 4 KB) on the disk.

Space Management

To manage large files, the file system divides the disk into discrete blocks and uses an inode to keep track of which blocks belong to a file. Direct indexing stores up to 12 block numbers (≈48 KB). Single, double, and triple indirect indexing extend this capability to megabytes, gigabytes, and terabytes respectively.

For example, in an ext2‑like file system:

Direct index: 12 blocks → up to 48 KB.

Single indirect: 1024 block pointers → up to 4 MB.

Double indirect: 1024 × 1024 pointers → up to 4 GB.

Triple indirect: 1024³ pointers → up to 4 TB.

Why cp Is So Fast?

The observed file is a sparse file: its logical size is 1 TB + 4 KB, but only two 4 KB regions contain actual data, totaling 8 KB. Unwritten regions do not allocate physical blocks, so the file occupies only 8 KB on disk.

When cp copies such a file, it merely replicates the metadata and the allocated blocks, completing almost instantly.

Summary

The key insights are:

File size (inode attribute) reflects logical size, not physical space.

Physical space is determined by the number of allocated blocks; unwritten holes consume no space.

Sparse files exploit this by having large logical sizes with minimal actual storage, making operations like cp appear extremely fast.

Understanding block allocation, inode indexing, and sparse file behavior is essential for grasping how modern file systems efficiently manage storage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

file systeminodesparse filecp commandStorage Fundamentals
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.