Fundamentals 13 min read

Understanding Linux PID Namespace Creation and Allocation in Containers

This article explains how Linux creates and allocates PID namespaces, how Docker containers obtain their own PID namespaces using CLONE_NEWPID, the internal kernel structures involved, and how processes view their PID inside a container.

Refining Core Development Skills

Dec 27, 2022

Understanding Linux PID Namespace Creation and Allocation in Containers

Hello, I'm Fei! When you run # ps -ef inside a container you will notice that the PIDs are very small, which raises the question of how these PIDs are allocated and how the kernel presents them.

Previously we introduced the process creation flow in "How Linux Processes Are Created"; the PID namespace is allocated during that process. This article dives into the inner workings of Docker's PID namespace.

1. Linux's Default PID Namespace

The default PID namespace is defined in kernel/nsproxy.c as init_pid_ns, which is declared in kernel/pid.c. Two important fields are level (the hierarchy depth) and pidmap (a bitmap indicating which PIDs have been allocated).

//file:include/linux/sched.h
struct task_struct {
    ...
    /* namespaces */
    struct nsproxy *nsproxy;
}

//file:kernel/nsproxy.c
struct nsproxy init_nsproxy = {
    .count = ATOMIC_INIT(1),
    .uts_ns = &init_uts_ns,
    .ipc_ns = &init_ipc_ns,
    .mnt_ns = NULL,
    .pid_ns = &init_pid_ns,
    .net_ns = &init_net,
};

//file:kernel/pid.c
struct pid_namespace init_pid_ns = {
    .kref = { .refcount = ATOMIC_INIT(2) },
    .pidmap = { [0 ... PIDMAP_ENTRIES-1] = { ATOMIC_INIT(BITS_PER_PAGE), NULL } },
    .last_pid = 0,
    .level = 0,
    .child_reaper = &init_task,
    .user_ns = &init_user_ns,
    .proc_inum = PROC_PID_INIT_INO,
};

The initial task (PID 0, the idle process) uses this default init_nsproxy. All processes that do not request a new namespace inherit this default tree.

2. Creating a New PID Namespace

When a process is created with the CLONE_NEWPID flag (as Docker does), the kernel creates an independent PID namespace.

The core of process creation is the copy_process function in kernel/fork.c. It copies namespaces, allocates a PID, and records it.

//file:kernel/fork.c
static struct task_struct *copy_process(...){
    ...
    // 2.1 copy namespaces
    retval = copy_namespaces(clone_flags, p);
    // 2.2 allocate pid
    pid = alloc_pid(p->nsproxy->pid_ns);
    // 2.3 record pid
    p->pid = pid_nr(pid);
    p->tgid = p->pid;
    attach_pid(p, PIDTYPE_PID, pid);
    ...
}

2.1 Constructing a New Namespace

copy_namespaces

calls create_new_namespaces, which creates a fresh nsproxy and copies or creates a PID namespace.

//file:kernel/nsproxy.c
int copy_namespaces(unsigned long flags, struct task_struct *tsk){
    if (!(flags & (CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWNET)))
        return 0;
    new_ns = create_new_namespaces(flags, tsk, user_ns, tsk->fs);
    tsk->nsproxy = new_ns;
    ...
}

The flags of interest are:

CLONE_NEWPID – create a new PID namespace.

CLONE_NEWNS – new mount namespace.

CLONE_NEWNET – new network namespace.

CLONE_NEWUTS – new UTS (hostname) namespace.

CLONE_NEWIPC – new IPC namespace.

CLONE_NEWUSER – new user namespace.

2.2 Allocating the Process ID

After the namespace is ready, alloc_pid obtains a struct pid from the namespace's PID cache and fills the pidmap bitmap.

//file:kernel/pid.c
struct pid *alloc_pid(struct pid_namespace *ns){
    pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
    pid->level = ns->level;
    for (i = ns->level; i >= 0; i--) {
        nr = alloc_pidmap(tmp);
        if (nr < 0) goto out_free;
        pid->numbers[i].nr = nr;
        pid->numbers[i].ns = tmp;
        tmp = tmp->parent;
    }
    return pid;
}

The loop allocates a PID in the current namespace and then in each parent namespace, storing the numbers in pid->numbers. This is why a container process has multiple PID values, one per level of the namespace tree.

2.3 Recording the PID in the Task Structure

Once allocated, the PID is stored in task_struct fields and linked into the kernel's PID hash tables.

//file:kernel/pid.c
void attach_pid(struct task_struct *task, enum pid_type type, struct pid *pid){
    link = &task->pids[type];
    link->pid = pid;
    hlist_add_head_rcu(&link->node, &pid->tasks[type]);
}

3. Viewing Container PIDs

Inside a container, the visible PID is obtained with pid_vnr, which calls pid_nr_ns using the container's active PID namespace.

//file:kernel/pid.c
pid_t pid_vnr(struct pid *pid){
    return pid_nr_ns(pid, task_active_pid_ns(current));
}

pid_t pid_nr_ns(struct pid *pid, struct pid_namespace *ns){
    struct upid *upid;
    pid_t nr = 0;
    if (pid && ns->level <= pid->level) {
        upid = &pid->numbers[ns->level];
        if (upid->ns == ns)
            nr = upid->nr;
    }
    return nr;
}

Thus, when you run # ps -ef inside a Docker container, the PID shown (e.g., 1 for the init process) is the value returned by pid_vnr for the container's PID namespace.

4. Summary

A process created in the root PID namespace may have PID 1256 at level 0, while the same process appears as PID 5 inside a level 1 container namespace. The kernel stores all these numbers in the struct pid and selects the appropriate one based on the namespace used for the lookup.

Consequently, by passing the container's PID namespace to the lookup function, the kernel prints the container‑local PID (e.g., 5) instead of the host PID.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Process Management Linux pid namespace

Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.