Fundamentals 11 min read

Why Does cp Copy a 100 GB File Instantly? Unveiling Sparse Files and Inode Indexing

A colleague copied a 100 GB file in under a second, prompting an investigation that reveals how Linux file systems distinguish logical file size from physical storage using inodes, block allocation, direct and indirect indexing, and sparse file techniques that make such copies appear instantaneous.

Open Source Linux
Open Source Linux
Open Source Linux
Why Does cp Copy a 100 GB File Instantly? Unveiling Sparse Files and Inode Indexing

Analysis of the File

A colleague used the cp command to copy a 100 GB file and was surprised that the operation finished in less than a second. The ls -lh command confirmed the file size, but du -sh reported only 2 M, indicating a discrepancy between logical size and actual disk usage.

sh-4.4# ls -lh
-rw-r--r-- 1 root root 100G Mar  6 12:22 test.txt
sh-4.4# du -sh ./test.txt
2.0M ./test.txt
sh-4.4# stat ./test.txt
  File: ./test.txt
  Size: 107374182400 Blocks: 4096   IO Block: 4096 regular file

The stat output shows Size (logical file size) as 100 GB, while Blocks (physical space) is only 4096 × 512 B ≈ 2 MB.

Key Points

Size represents the logical file size visible to users.

Blocks represent the actual physical space allocated on disk.

File System Basics

A file system is a container for storing data, analogous to a luggage storage service: filenames are like names on a luggage tag, metadata is the tag, the file is the luggage, and the storage room is the disk.

Space Management

Data is stored in fixed‑size blocks (commonly 4 KB). To keep track of which blocks belong to a file, the file system uses an inode that stores metadata and an array of block pointers.

Inode/Block Concept

In a simple inode with 15 block pointers:

First 12 pointers are direct indexes (store block numbers directly).

13th pointer is a single‑indirect index (points to a block that contains more block numbers).

14th pointer is a double‑indirect index .

15th pointer is a triple‑indirect index .

Direct indexes can address up to 12 × 4 KB = 48 KB of data. Single‑indirect can address 1024 × 4 KB ≈ 4 MB, double‑indirect up to 4 GB, and triple‑indirect up to 4 TB, giving a maximum file size of roughly 4 TB in ext2‑like systems.

Why cp Is So Fast

The file in question is a sparse file : its logical size is 1 TB + 4 KB, but only two 4 KB blocks contain actual data. Unwritten regions do not allocate physical blocks, so copying the file only copies the two real blocks, completing almost instantly.

Key observation: the file's size field in the inode reflects the logical length, while physical space usage depends on how many blocks are actually allocated.

Summary

Linux file systems separate logical file size from physical storage using inodes and block allocation. Direct and multi‑level indirect indexing allow efficient management of large files, while sparse files store only the data that exists, making operations like cp appear extremely fast.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

linuxfile systeminodesparse filecp command
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.