Fundamentals 18 min read

Why the Linux Kernel Replaced PID Bitmap with a Radix Tree and Its Performance Impact

This article explains the Linux kernel's transition from bitmap‑based PID allocation to a radix‑tree (IDR) implementation, describes the underlying data structures and code changes, and presents benchmark results showing roughly a 50% performance improvement for common utilities.

Refining Core Development Skills

Jul 11, 2023

Why the Linux Kernel Replaced PID Bitmap with a Radix Tree and Its Performance Impact

Hello everyone, I am Feige!

In the next book I will upgrade the referenced Linux kernel version to 6.10. While writing the process‑creation chapter last weekend, I discovered that the kernel has switched PID management from a bitmap to a radix‑tree, so I write this article to discuss the change.

The first time I wrote process creation I used kernel version 3.10. In that version allocated PIDs were stored in a bitmap. In versions 5.4 and 6.1 I found that PID management had already been replaced by a radix‑tree. Looking at the changelog, the switch happened after Linux 4.15.

Therefore, today I will explain why the Linux kernel replaced the bitmap with a radix‑tree and show the performance effect of this replacement.

1. Old bitmap method for managing PID

The kernel needs to allocate a PID for each process/thread.

If each used PID were stored in a traditional int variable, memory consumption would be huge. Supporting a maximum of 65,535 processes would require 65,535 × 4 bytes ≈ 260 KB.

A bitmap can compress the storage dramatically: one bit indicates whether a PID is used, so 65,535 PIDs need only 65,535 / 8 ≈ 8 KB, saving a large amount of memory.

The small memory footprint also gives excellent cache locality, making traversal very fast; therefore the kernel used a bitmap for PID management for a long time.

The core function for allocating a PID when creating a process is alloc_pid. In the 3.10 kernel it calls the pidmap interface alloc_pidmap. Its source looks like this:

<code style="padding:16px;color:#abb2bf;font-family:Operator Mono,Consolas,Monaco,Menlo,monospace;font-size:12px"><span style="color:#5c6370;font-style:italic;line-height:26px">//file:kernel/pid.c</span>
<span style="line-height:26px">struct pid *<span style="color:#61aeee;line-height:26px">alloc_pid</span>(struct pid_namespace *ns)</span>{
 ...
 <span style="color:#5c6370;font-style:italic;line-height:26px">// A process may belong to multiple namespaces, each needs a PID</span>
 <span style="color:#5c6370;font-style:italic;line-height:26px">// Actually calls alloc_pidmap to request an integer PID</span>
 tmp = ns;
 pid->level = ns->level;
 for (i = ns->level; i >= 0; i--) {
    nr = alloc_pidmap(tmp);
    pid->numbers[i].nr = nr;
    ... 
 }
 ... 
 return pid
}
</code>

As mentioned, the bitmap’s biggest advantage is memory saving, but it also has a major drawback: allocating a new PID requires scanning the entire bitmap, which is computationally expensive when many processes exist.

<code style="padding:16px;color:#abb2bf;font-family:Operator Mono,Consolas,Monaco,Menlo,monospace;font-size:12px"><span style="color:#5c6370;font-style:italic;line-height:26px">// file:kernel/pid.c</span>
<span style="line-height:26px"><span style="color:#c678dd;line-height:26px">static</span> <span style="color:#c678dd;line-height:26px">int</span> <span style="color:#61aeee;line-height:26px">alloc_pidmap</span>(struct pid_namespace *pid_ns)</span>{
 ... 
 map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
 for (i = 0; i <= max_scan; ++i) {
    for ( ; ; ) {
        if (!test_and_set_bit(offset, map->page)) {
            atomic_dec(&map->nr_free);
            set_last_pid(pid_ns, last, pid);
            return pid;
        }
        offset = find_next_offset(map, offset);
        pid = mk_pid(pid_ns, map, offset);
        ... 
    }
    ... 
 }
}
</code>

In recent years, server memory has grown to hundreds of gigabytes and container workloads have increased the number of processes dramatically. The memory‑saving advantage of a bitmap becomes less valuable, while its high CPU cost for PID allocation becomes increasingly problematic.

2. Using a radix‑tree to manage PID

In 2017 Gargi Sharma submitted a patch named “Replace PID bitmap allocation with IDR API”. This patch started replacing the bitmap with a radix‑tree and was merged into Linux 4.15. Details can be found at LWN or LKML .

Tree data structures are numerous; a radix‑tree is one of them. Its most obvious characteristic is that each level manages a 6‑bit segment, giving a fixed branching factor of 64 (2⁶) (except the root) and a fixed number of levels.

Important fields in a radix‑tree node are shift, slots and tags.

<code style="padding:16px;color:#abb2bf;font-family:Operator Mono,Consolas,Monaco,Menlo,monospace;font-size:12px"><span style="color:#5c6370;font-style:italic;line-height:26px">//file:include/linux/xarray.h</span>
<span style="line-height:26px"><span style="color:#c678dd;line-height:26px">struct</span> <span style="color:#e6c07b;line-height:26px">xa_node</span> {</span>
 ... 
 <span style="color:#c678dd;line-height:26px">unsigned</span> <span style="color:#c678dd;line-height:26px">char</span> shift;
 <span style="color:#c678dd;line-height:26px">void</span> __rcu *slots[XA_CHUNK_SIZE];
 <span style="color:#c678dd;line-height:26px">union</span> {
    <span style="color:#c678dd;line-height:26px">unsigned</span> <span style="color:#c678dd;line-height:26px">long</span> tags[XA_MAX_MARKS][XA_MARK_LONGS];
    ... 
 };
}<br/></code>

shift

indicates which 6‑bit segment the node represents. With the default radix size of 6, the lowest internal node has shift 0, the next‑to‑root level has shift 6, the level above that has shift 12, and so on, increasing by 6 each level. slots is an array of pointers to child nodes. By default XA_CHUNK_SIZE is 64, so slots[64] holds pointers to the next level; a null pointer means no child. tags records the allocation state of each slot. It is a long‑type bitmap where each bit indicates whether the corresponding slot is already allocated.

The above description is abstract; let’s use a simple example to see how a radix‑tree looks in memory.

The kernel’s radix‑tree manages 32‑bit integer IDs, but for simplicity we illustrate with a 16‑bit example.

A 16‑bit unsigned integer ranges from 0 to 65 535. Suppose the IDs 100, 1 000, 10 000, 50 000 and 60 000 have already been allocated.

First convert them to binary:

100: 0000 000001 100100

1 000: 0000 001111 101000

10 000: 0010 011100 010000

50 000: 1100 001101 010000

60 000: 1110 101001 100000

We group bits from the least‑significant side in 6‑bit chunks, so each 16‑bit number is split into three segments.

In the radix‑tree, the root stores the first segment of each number. If a segment is occupied, the corresponding slot points to a child node; otherwise it stays null. The kernel computes the segment by right‑shifting the value by the node’s shift (root shift = 12).

For the example numbers, the first segments (after converting to decimal) are 0, 0, 2, 12 and 14.

The second layer uses the middle 6 bits (shift = 6) and the third layer uses the last 6 bits (shift = 0). Expressed in decimal, the three segments are:

100: 0, 1, 36

1 000: 0, 15, 40

10 000: 2, 28, 16

50 000: 12, 13, 16

60 000: 14, 41, 32

The resulting radix‑tree structure is shown below.

Taking the integer 100 as an example, its three segments are 0, 1, 36. The first segment 0 means we store a pointer to a child node in root->slots[0]. The second segment 1 stores a pointer in that child’s slots[1]. The third segment 36 stores the final value 100 in slots[36] of the third‑level node.

Thus the radix‑tree is built.

To check whether an integer exists or to allocate a new unused ID, we only need to traverse the three levels and examine the tag bits, instead of scanning the entire bitmap, dramatically reducing computational complexity.

In the kernel the IDs are 32‑bit, so six levels of nodes are required.

In newer kernels (e.g., 6.1) alloc_pid was changed to use idr_alloc to request an unused PID:

<code style="padding:16px;color:#abb2bf;font-family:Operator Mono,Consolas,Monaco,Menlo,monospace;font-size:12px"><span style="color:#5c6370;font-style:italic;line-height:26px">//file:kernel/pid.c</span>
<span style="line-height:26px">struct pid *<span style="color:#61aeee;line-height:26px">alloc_pid</span>(struct pid_namespace *ns, ...)</span>{
 ... 
 <span style="color:#5c6370;font-style:italic;line-height:26px">// A process may belong to multiple namespaces, each needs a PID</span>
 <span style="color:#5c6370;font-style:italic;line-height:26px">// Actually calls idr_alloc to request an integer PID</span>
 tmp = ns;
 pid->level = ns->level;
 for (i = ns->level; i >= 0; i--) {
    nr = idr_alloc(&tmp->idr, NULL, tid,
                tid + 1, GFP_ATOMIC);
    ... 
    pid->numbers[i].nr = nr;
    pid->numbers[i].ns = tmp;
    tmp = tmp->parent;
 }
 ... 
}
</code>

The core allocation routine is idr_get_free, which walks the radix‑tree nodes and uses the tag and slot fields to locate a free integer ID.

<code style="padding:16px;color:#abb2bf;font-family:Operator Mono,Consolas,Monaco,Menlo,monospace;font-size:12px"><span style="color:#5c6370;font-style:italic;line-height:26px">//file:lib/radix-tree.c</span>
<span style="line-height:26px"><span style="color:#c678dd;line-height:26px">void</span> __rcu **<span style="color:#61aeee;line-height:26px">idr_get_free</span>(struct radix_tree_root *root, ...)</span>{
 ... 
 shift = radix_tree_load_root(root, &child, &maxindex);
 while (shift) {
    shift -= RADIX_TREE_MAP_SHIFT; <span style="color:#5c6370;font-style:italic;line-height:26px">// RADIX_TREE_MAP_SHIFT is 6</span>
    ... 
    <span style="color:#5c6370;font-style:italic;line-height:26px">// Traverse tag bitmap to find next available slot</span>
    offset = radix_tree_find_next_bit(node, IDR_FREE,
                        offset + 1);
    start = next_index(start, node, offset);
 }
 ... 
}
</code>

3. Performance of the radix‑tree

Now let’s look at the performance after replacing the bitmap with a radix‑tree, using experimental data provided by the patch author Gargi Sharma (source: https://lwn.net/Articles/735675/).

With 10 000 processes, the execution times of ps and pstree were measured:

<code style="padding:16px;color:#abb2bf;font-family:Operator Mono,Consolas,Monaco,Menlo,monospace;font-size:12px">ps:
 With IDR API With bitmap
real 0m1.479s 0m2.319s
user 0m0.070s 0m0.060s
sys 0m0.289s 0m0.516s
</code>

<code style="padding:16px;color:#abb2bf;font-family:Operator Mono,Consolas,Monaco,Menlo,monospace;font-size:12px">pstree:
 With IDR API With bitmap
real 0m1.024s 0m1.794s
user    0m0.348s 0m0.612s
sys 0m0.184s 0m0.264s
</code>

The results show that both commands run roughly 50 % faster when the IDR (radix‑tree) API is used.

Feel free to share this internal development knowledge with your friends and grow together!

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Linux bitmap Radix Tree PiD

Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.