Fundamentals 11 min read

Why cp Can Copy a 100 GB File Instantly: Understanding Sparse Files and Inode Block Indexing

The article explains why the Linux cp command appears to copy a 100 GB file in less than a second by exploring sparse files, the difference between logical file size and physical block allocation, and how inode direct and indirect indexing enable efficient storage and fast copying.

Architecture Digest
Architecture Digest
Architecture Digest
Why cp Can Copy a 100 GB File Instantly: Understanding Sparse Files and Inode Block Indexing

A colleague was surprised that copying a 100 GB file with cp finished in under a second. Using ls -lh confirmed the file size, but du -sh reported only 2 MB, indicating a discrepancy between logical size and actual disk usage.

The stat output showed Size of 107374182400 bytes (100 GB) and Blocks of 4096 (2 MB), illustrating that Size reflects the logical length while Blocks represent the physical space actually allocated.

To understand this, the article likens a file system to a luggage storage service: file names are like tags, metadata is the index, and the disk is the storage room. The key component is the inode , which stores file metadata and an array of block pointers.

Inodes use a combination of direct and indirect block pointers. The first 12 entries are direct pointers (each pointing to a 4 KB block, covering up to 48 KB). The 13th entry is a single‑indirect block, allowing 1024 additional blocks (≈4 MB). The 14th entry is double‑indirect (≈4 GB) and the 15th is triple‑indirect (≈4 TB). This multilevel indexing lets a file system support very large files without allocating space for unwritten regions.

When a file contains large gaps of unwritten data, the file system creates a sparse file . Only the blocks that actually hold data are allocated; the gaps remain unallocated, so the physical disk usage stays small while the logical size stays large. Copying such a file with cp is fast because the command only copies the allocated blocks, not the empty space.

Example commands from the investigation: sh-4.4# ls -lh -rw-r--r-- 1 root root 100G Mar 6 12:22 test.txt sh-4.4# du -sh ./test.txt 2.0M ./test.txt sh-4.4# time cp ./test.txt ./test.txt.cp real 0m0.107s user 0m0.008s sys 0m0.085s sh-4.4# stat ./test.txt File: ./test.txt Size: 107374182400 Blocks: 4096 IO Block: 4096 regular file Device: 78h/120d Inode: 3148347 Links: 1 Access: (0644/-rw-r--r--) Uid: (0/ root) Gid: (0/ root) Access: 2021-03-13 12:22:00.888871000 +0000 Modify: 2021-03-13 12:22:46.562243000 +0000 Change: 2021-03-13 12:22:46.562243000 +0000

The conclusion is that the file’s logical size (100 GB) is stored in the inode, but only 8 KB of actual data occupies physical blocks, making the copy operation extremely quick. This behavior is typical of sparse files in Unix‑like file systems such as ext2/ext3/ext4.

linuxstoragefile systemInodesparse filecp command
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.