Fundamentals 11 min read

Why cp Can Copy a 100 GB File Instantly: Understanding Sparse Files and Inode Block Indexing

The article explains why the Linux cp command appears to copy a 100 GB file in less than a second by exploring sparse files, the difference between logical file size and physical block allocation, and how inode direct and indirect indexing enable efficient storage and fast copying.

Architecture Digest

Oct 26, 2021

Why cp Can Copy a 100 GB File Instantly: Understanding Sparse Files and Inode Block Indexing

A colleague was surprised that copying a 100 GB file with cp finished in under a second. Using ls -lh confirmed the file size, but du -sh reported only 2 MB, indicating a discrepancy between logical size and actual disk usage.

The stat output showed Size of 107374182400 bytes (100 GB) and Blocks of 4096 (2 MB), illustrating that Size reflects the logical length while Blocks represent the physical space actually allocated.

To understand this, the article likens a file system to a luggage storage service: file names are like tags, metadata is the index, and the disk is the storage room. The key component is the inode , which stores file metadata and an array of block pointers.

Inodes use a combination of direct and indirect block pointers. The first 12 entries are direct pointers (each pointing to a 4 KB block, covering up to 48 KB). The 13th entry is a single‑indirect block, allowing 1024 additional blocks (≈4 MB). The 14th entry is double‑indirect (≈4 GB) and the 15th is triple‑indirect (≈4 TB). This multilevel indexing lets a file system support very large files without allocating space for unwritten regions.

When a file contains large gaps of unwritten data, the file system creates a sparse file . Only the blocks that actually hold data are allocated; the gaps remain unallocated, so the physical disk usage stays small while the logical size stays large. Copying such a file with cp is fast because the command only copies the allocated blocks, not the empty space.

Example commands from the investigation:

sh-4.4# ls -lh
-rw-r--r-- 1 root root 100G Mar 6 12:22 test.txt

sh-4.4# du -sh ./test.txt
2.0M ./test.txt

sh-4.4# time cp ./test.txt ./test.txt.cp
real 0m0.107s
user 0m0.008s
sys 0m0.085s

sh-4.4# stat ./test.txt
  File: ./test.txt
  Size: 107374182400   Blocks: 4096   IO Block: 4096   regular file
  Device: 78h/120d   Inode: 3148347   Links: 1
  Access: (0644/-rw-r--r--)  Uid: (0/ root)   Gid: (0/ root)
  Access: 2021-03-13 12:22:00.888871000 +0000
  Modify: 2021-03-13 12:22:46.562243000 +0000
  Change: 2021-03-13 12:22:46.562243000 +0000

The conclusion is that the file’s logical size (100 GB) is stored in the inode, but only 8 KB of actual data occupies physical blocks, making the copy operation extremely quick. This behavior is typical of sparse files in Unix‑like file systems such as ext2/ext3/ext4.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux storage File System inode sparse file cp command

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.