Fundamentals 40 min read

Why cp Copies a 100 GB File in <1 s: The Sparse File Secret

This article explains why the Linux cp command can duplicate a seemingly 100 GB file in less than a second by revealing how sparse files work, the role of inodes and block allocation, the different cp --sparse modes, and the underlying filesystem mechanisms such as fiemap, extent copying, and hole punching.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Why cp Copies a 100 GB File in <1 s: The Sparse File Secret

Background: cp and the surprising speed

The cp command is one of the most used utilities in Linux. A user observed that copying a 100 GiB file finished in under a second, far faster than the theoretical transfer time of a mechanical SATA disk (≈150 MiB/s → ~11 minutes).

Understanding file size vs. physical space

Linux reports two size metrics: Size (the logical file length) and Blocks (the number of 512‑byte sectors actually allocated). A 100 GiB file can occupy only a few megabytes if most of its content consists of empty space, i.e., it is a sparse file .

File system fundamentals

A file system stores data in fixed‑size blocks (commonly 4 KiB). Metadata about a file is kept in an inode , which records attributes such as mode, UID, timestamps, and an array i_block[15] that points to the blocks holding the file’s data.

The first 12 entries of i_block are direct indexes (store up to 48 KiB). Entry 12 is a single‑indirect block, entry 13 a double‑indirect block, and entry 14 a triple‑indirect block. This multilevel indexing allows ext2/ext4 to address up to 4 TiB of data.

Space management structures

Each block can be free or used; a bitmap (a bit array) records this status (0 = free, 1 = used).

When a file is created, an inode is allocated; when data is written, blocks are allocated and their numbers are stored in the inode’s index structures.

Sparse files

A sparse file has logical size larger than the sum of its allocated blocks. Unallocated regions (holes) read as zeroes but consume no physical storage. Linux creates sparse files with truncate -s or fallocate -p, and can reclaim space with fallocate -p (punch‑hole). truncate -s 100G test.txt After creation, stat shows Size: 107374182400 Blocks: 0, confirming the hole.

APIs for sparse handling

fallocate(fd, mode, offset, len)

– pre‑allocate space (mode = 0) or punch a hole (mode = FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE).

In Go, these are wrapped as:

func PreAllocate(f *os.File, sizeInBytes int) error {
    return syscall.Fallocate(int(f.Fd()), 0x0, 0, int64(sizeInBytes))
}

func PunchHole(file *os.File, offset, size int64) error {
    err := syscall.Fallocate(int(file.Fd()), 0x1|0x2, offset, size)
    if err == syscall.ENOSYS || err == syscall.EOPNOTSUPP {
        return syscall.EPERM
    }
    return err
}

How cp treats sparse files

The cp utility has a --sparse option with three modes: auto (default) – detects sparse source files via stat (blocks < size/512) and copies only the allocated data, preserving holes. always – regardless of source, any all‑zero block is turned into a hole in the destination, minimizing space. never – copies every byte, creating a fully allocated target file.

cp source code walk‑through

In coreutils, the option is parsed into an enum SPARSE_NEVER, SPARSE_AUTO, SPARSE_ALWAYS. The main copy routine eventually calls copy_reg for regular files.

Inside copy_reg:

It determines whether the source is probably sparse via is_probably_sparse(), which checks ST_NBLOCKS(sb) < sb->st_size / ST_NBLOCKSIZE.

If --sparse=always or auto with a sparse source, make_holes is set true.

When make_holes is true, the code prefers extent_copy, which uses the FIEMAP ioctl to obtain the exact layout of data extents and holes, then copies only the data extents.

If extent_copy is unavailable, it falls back to sparse_copy, which reads blocks and treats any all‑zero buffer (checked by is_nul()) as a hole when the mode is always.

Key helper functions

is_probably_sparse(const struct stat *sb)

– returns true when st_blocks is less than st_size/512. extent_copy() – calls extent_scan_read() (which invokes ioctl(FS_IOC_FIEMAP)) to retrieve a list of extents, then copies each extent and optionally creates holes. sparse_copy() – reads data sequentially; if a buffer is all zero and the mode is always, it issues a punch‑hole instead of writing.

Experimental verification

A step‑by‑step experiment creates a 1 GiB sparse file, allocates two 4 KiB data regions (one real data, one all‑zero), and then copies it with the three --sparse modes. stat results: test.txt.auto – Size = 1 GiB, Blocks = 16 (≈8 KiB) – default auto preserved holes. test.txt.always – Size = 1 GiB, Blocks = 8 (≈4 KiB) – always mode turned the all‑zero region into a hole. test.txt.never – Size = 1 GiB, Blocks ≈ 2 097 160 (≈1 GiB) – never mode wrote every byte.

Take‑aways

File size and physical storage are distinct; sparse files decouple them.

Linux filesystems use inodes, block indexing, and bitmap tables to manage space.

Sparse file support (hole creation and detection) is provided by the filesystem via the FIEMAP ioctl.

The cp --sparse option leverages this information to avoid copying empty regions, dramatically reducing I/O.

Choosing --sparse=always yields the smallest destination, while --sparse=never guarantees a byte‑for‑byte copy at the cost of space and time.

Understanding these mechanisms is essential for performance‑critical scripts, backup tools, and cloud‑storage cost optimisation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Linuxinodeext4cpSparse Files
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.