Fundamentals 36 min read

How Linux Fork Uses Copy‑On‑Write to Boost Process Creation Efficiency

This article explains the fork system call in the Linux kernel, details the copy‑on‑write (COW) mechanism that underpins its efficiency, provides code examples, and explores practical scenarios and performance implications for process creation, memory usage, and concurrent server programming.

Deepin Linux
Deepin Linux
Deepin Linux
How Linux Fork Uses Copy‑On‑Write to Boost Process Creation Efficiency

1. Fork Function Basics

In Linux, the fork system call creates a new process by duplicating the calling (parent) process. The child receives a copy of the parent's resources such as open files, environment variables, and memory, but the actual memory pages are shared initially.

1.1 What is fork

fork

works like a "clone" operation: the kernel creates a child process that inherits most attributes of the parent. The child gets its own PID while sharing the same code, data, and stack pages with the parent until a write occurs.

1.2 Using fork

In C, include <unistd.h> and use the prototype:

#include <unistd.h>
pid_t fork(void);

The call returns three possible values:

In the parent, a positive PID of the newly created child.

In the child, zero.

On error, -1 and errno is set.

Typical usage:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
int main() {
    pid_t pid = fork();
    if (pid < 0) {
        perror("fork error");
        exit(EXIT_FAILURE);
    } else if (pid == 0) {
        printf("I am the child process, my pid is %d, parent pid is %d
", getpid(), getppid());
    } else {
        printf("I am the parent process, my pid is %d, child pid is %d
", getpid(), pid);
    }
    return 0;
}

Running the program prints distinct PIDs for parent and child, demonstrating that the return value distinguishes the two execution paths.

2. Copy‑On‑Write (COW) Technique Details

2.1 What is COW

Copy‑On‑Write means that memory pages are marked read‑only and shared between parent and child after fork. When either process attempts to write to a shared page, the kernel allocates a new physical page, copies the original content, updates the page table for the writer, and marks the new page writable. This defers actual copying until it is needed.

2.2 Why use COW

Early operating systems copied the entire address space during fork, which was costly in time and memory. COW avoids this by sharing pages initially, reducing both memory consumption and process‑creation latency, especially when many child processes only read data.

2.3 Implementation Mechanism

When fork is called, the kernel creates a new process control block (PCB) for the child and copies the parent’s page tables, marking all shared pages as read‑only. A write attempt triggers a page‑fault; the kernel checks that the fault originates from a COW page, allocates a new page, copies the data, and updates the child’s page table entry.

Example scenario:

/* Parent has a page at virtual address 0x1000 -> physical 0x80000 */
/* Child inherits the same mapping, page marked read‑only */
/* Child writes to 0x1000 */
/* Page fault occurs → kernel allocates new page 0x90000, copies data, updates child’s PTE */

After the fault, the child sees its own writable copy while the parent continues to reference the original read‑only page.

3. Dissecting the COW Process

3.1 Memory Sharing and Marking

Both processes initially share the same physical pages. The kernel sets the shared pages’ permission bits to read‑only, preventing accidental writes and enabling the fault‑driven copy mechanism.

3.2 Write‑Triggered Copy

The write sequence consists of three steps:

Kernel allocates a fresh physical page for the writer.

It copies the original page’s contents into the new page.

It updates the writer’s page‑table entry to point to the new page and marks it writable.

Only the modified page is duplicated; all other pages remain shared.

3.3 Key Technical Points

Read‑only page marking – establishes the copy‑on‑write contract.

Write‑operation fault detection – the page‑fault serves as a precise trigger for copying.

On‑demand page copying – only pages that are actually written are duplicated, minimizing overhead.

4. Operations That Trigger or Bypass COW

4.1 Assignment Operations

Any write to a variable after fork (e.g., num = 20;, num++, num += 5) causes COW because the underlying page must become writable for the child.

4.2 Pointer and Array Modifications

Changing data through a pointer or modifying an array element (e.g., *ptr = 30; or arr[3] = 100;) also triggers COW, resulting in a private copy for the writing process.

4.3 Non‑Writing Operations

Pure reads, address‑of operations, condition checks, and passing arguments by value do not cause COW; the shared pages remain untouched.

5. Real‑World Applications of COW

5.1 Server‑Side Programming

Web servers often fork a child to handle each client connection. COW lets the child start instantly with shared code and read‑only data, copying only when the child modifies its own state, which dramatically improves concurrency and reduces memory pressure.

5.2 Daemon Creation

Creating a daemon typically involves a double fork. The first fork uses COW to quickly detach from the controlling terminal; the second fork finalizes the daemon, again benefiting from minimal copying.

5.3 Parallel Data Processing

In big‑data or AI workloads, a parent process can fork many workers that share the same read‑only model or configuration data. Workers only copy pages they need to modify, keeping overall memory usage low while scaling across cores.

6. COW in Various Programming Environments

6.1 OS‑Level Fork Mechanism

The kernel’s COW implementation is the foundation for fast process creation in Linux, enabling high‑throughput services such as HTTP servers.

6.2 C++ std::string (pre‑C++11)

Older C++ standard libraries implemented std::string with COW: multiple strings shared the same buffer until one performed a mutating operation, at which point a private copy was made. Modern C++11+ removed this in favor of small‑string optimization and move semantics.

6.3 Java CopyOnWriteArrayList

Java’s CopyOnWriteArrayList applies the same principle to a thread‑safe list. Reads access the current array without locking; writes acquire a lock, copy the array, modify the copy, and replace the reference, providing excellent performance for read‑heavy workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

memory optimizationProcess ManagementLinuxC programmingCopy-on-WriteCOWfork
Deepin Linux
Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.