Understanding Linux Process Creation and Termination (Part 1)
This article walks through the Linux kernel mechanisms for creating and destroying processes, covering copy‑on‑write, the fork/vfork/clone system calls, the kernel_clone implementation in kernels 5.0 and 6.5, the copy_process workflow, and the steps the kernel takes to wake up a new task and clean up a terminated one.
Series Overview
The series analyses Linux process management and scheduling, focusing on creation and destruction while using kernel source code and eBPF programs to capture real‑time scheduling data.
0. Principle Background
Process creation relies on copy‑on‑write (COW): the child receives a copy of the parent’s task_struct and shares the parent’s page tables until a write triggers a page‑fault that copies the affected page.
Three system calls create processes: fork, vfork, and clone. All eventually invoke the internal kernel_clone() function.
1. System Call Entry
Linux 5.0 defines the entry points in /kernel/fork.c:
asmlinkage long sys_clone(unsigned long, unsigned long, int __user *,
int __user *, unsigned long);
asmlinkage long sys_vfork(void);
asmlinkage long sys_fork(void);Linux 6.5 moves the prototypes to /include/linux/syscalls.h and implements the calls in /kernel/fork.c using kernel_clone() instead of _do_fork:
asmlinkage long sys_clone(unsigned long, unsigned long, int __user *,
int __user *, unsigned long);
asmlinkage long sys_vfork(void);
asmlinkage long sys_fork(void);2. kernel_clone() Differences
In 6.5, kernel_clone() receives a struct kernel_clone_args that bundles all parameters, replacing the older _do_fork interface. It adds support for CLONE_PIDFD (mutually exclusive with CLONE_PARENT_SETTID) and integrates the LRU‑generation memory‑management optimisation via CONFIG_LRU_GEN.
3. copy_process() Workflow
The core of process creation is copy_process(), which performs:
Allocate and duplicate task_struct with dup_task_struct().
Initialise scheduling data via sched_fork().
Handle file descriptor tables with copy_files().
Copy filesystem context using copy_fs().
Duplicate or share the memory descriptor through copy_mm() (or dup_mm() when needed).
Copy namespaces via copy_namespaces().
Allocate a PID with alloc_pid().
3.1 dup_task_struct()
It allocates a new task_struct node, copies the parent’s structure with arch_dup_task_struct(), allocates a kernel stack ( alloc_thread_stack_node()), and writes a magic value at the stack end for overflow detection ( set_task_stack_end_magic()).
static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
{
if (node == NUMA_NO_NODE)
node = tsk_fork_get_node(orig);
tsk = alloc_task_struct_node(node);
if (!tsk)
return NULL;
err = arch_dup_task_struct(tsk, orig);
if (err)
goto free_tsk;
err = alloc_thread_stack_node(tsk, node);
if (err)
goto free_tsk;
set_task_stack_end_magic(tsk);
return tsk;
free_tsk:
/* error handling omitted */
return NULL;
}3.2 sched_fork()
Initialises scheduling fields by calling __sched_fork(), which sets up sched_entity, inherits priority, assigns a scheduling class, and records the target CPU with set_task_cpu(). It also initialises the thread‑info preempt count.
int sched_fork(unsigned long clone_flags, struct task_struct *p)
{
__sched_fork(clone_flags, p);
p->__state = TASK_NEW; /* not runnable until wake_up_new_task */
return 0;
}3.3 copy_files() and copy_fs()
If CLONE_FILES is set, the child shares the parent’s files structure (reference count increment). Otherwise dup_fd() creates a new descriptor table. copy_fs() behaves similarly with CLONE_FS, either sharing the fs_struct or duplicating it via copy_fs_struct().
static int copy_files(unsigned long clone_flags, struct task_struct *tsk, int no_files)
{
if (clone_flags & CLONE_FILES) {
atomic_inc(&oldf->count);
goto out;
}
newf = dup_fd(oldf, NR_OPEN_MAX, &error);
if (!newf)
goto out;
tsk->files = newf;
out:
return 0;
}3.4 copy_mm()
Handles the process address space. With CLONE_VM, the child shares the parent’s mm_struct. Otherwise dup_mm() allocates a new mm_struct, copies the structure (not the pages), initialises it ( mm_init()), and duplicates the page tables via dup_mmap().
static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
{
struct mm_struct *mm, *oldmm = current->mm;
if (!oldmm)
return 0; /* kernel thread */
if (clone_flags & CLONE_VM) {
mmget(oldmm);
mm = oldmm;
} else {
mm = dup_mm(tsk, oldmm);
if (!mm)
return -ENOMEM;
}
tsk->mm = mm;
tsk->active_mm = mm;
return 0;
}3.5 dup_mm()
Allocates a new mm_struct, copies the parent’s fields with memcpy(), initialises the descriptor, and copies the page tables.
static struct mm_struct *dup_mm(struct task_struct *tsk, struct mm_struct *oldmm)
{
struct mm_struct *mm = allocate_mm();
if (!mm)
return NULL;
memcpy(mm, oldmm, sizeof(*mm));
if (!mm_init(mm, tsk, mm->user_ns))
goto fail;
if (dup_mmap(mm, oldmm))
goto free_pt;
return mm;
fail:
/* error handling omitted */
return NULL;
}4. wake_up_new_task()
After copy_process() finishes, wake_up_new_task() changes the task state from TASK_NEW to TASK_RUNNING, selects a CPU, locks the run‑queue, activates the task, and checks whether it should pre‑empt the currently running task.
void wake_up_new_task(struct task_struct *p)
{
WRITE_ONCE(p->__state, TASK_RUNNING);
p->recent_used_cpu = task_cpu(p);
__set_task_cpu(p, select_task_rq(p, task_cpu(p), WF_FORK));
rq = __task_rq_lock(p, &rf);
activate_task(rq, p, ENQUEUE_NOCLOCK);
check_preempt_curr(rq, p, WF_FORK);
}5. Process Termination
Processes end voluntarily via exit() or by returning from main, or involuntarily via signals or faults. The kernel releases most resources but keeps the task_struct until the parent calls wait(). If the parent has already exited, the init process (PID 1) adopts the orphan. A process that exits before its parent is reaped becomes a zombie.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
