Fundamentals 11 min read

Why cp Can Copy a 100 GB File Instantly: Understanding Sparse Files and Inode Block Indexing

The article explains why the Linux cp command appears to copy a 100 GB file in less than a second by exploring sparse files, the difference between logical file size and physical block allocation, and how inode direct and indirect indexing enable efficient storage and fast copying.

Architecture Digest
Architecture Digest
Architecture Digest
Why cp Can Copy a 100 GB File Instantly: Understanding Sparse Files and Inode Block Indexing

A colleague was surprised that copying a 100 GB file with cp finished in under a second. Using ls -lh confirmed the file size, but du -sh reported only 2 MB, indicating a discrepancy between logical size and actual disk usage.

The stat output showed Size of 107374182400 bytes (100 GB) and Blocks of 4096 (2 MB), illustrating that Size reflects the logical length while Blocks represent the physical space actually allocated.

To understand this, the article likens a file system to a luggage storage service: file names are like tags, metadata is the index, and the disk is the storage room. The key component is the inode , which stores file metadata and an array of block pointers.

Inodes use a combination of direct and indirect block pointers. The first 12 entries are direct pointers (each pointing to a 4 KB block, covering up to 48 KB). The 13th entry is a single‑indirect block, allowing 1024 additional blocks (≈4 MB). The 14th entry is double‑indirect (≈4 GB) and the 15th is triple‑indirect (≈4 TB). This multilevel indexing lets a file system support very large files without allocating space for unwritten regions.

When a file contains large gaps of unwritten data, the file system creates a sparse file . Only the blocks that actually hold data are allocated; the gaps remain unallocated, so the physical disk usage stays small while the logical size stays large. Copying such a file with cp is fast because the command only copies the allocated blocks, not the empty space.

Example commands from the investigation:

sh-4.4# ls -lh
-rw-r--r-- 1 root root 100G Mar 6 12:22 test.txt
sh-4.4# du -sh ./test.txt
2.0M ./test.txt
sh-4.4# time cp ./test.txt ./test.txt.cp
real 0m0.107s
user 0m0.008s
sys 0m0.085s
sh-4.4# stat ./test.txt
  File: ./test.txt
  Size: 107374182400   Blocks: 4096   IO Block: 4096   regular file
  Device: 78h/120d   Inode: 3148347   Links: 1
  Access: (0644/-rw-r--r--)  Uid: (0/ root)   Gid: (0/ root)
  Access: 2021-03-13 12:22:00.888871000 +0000
  Modify: 2021-03-13 12:22:46.562243000 +0000
  Change: 2021-03-13 12:22:46.562243000 +0000

The conclusion is that the file’s logical size (100 GB) is stored in the inode, but only 8 KB of actual data occupies physical blocks, making the copy operation extremely quick. This behavior is typical of sparse files in Unix‑like file systems such as ext2/ext3/ext4.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Linuxstoragefile systeminodesparse filecp command
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.