Fundamentals 16 min read

How Linux Implements Per‑CPU Variables: From Definition to Runtime Access

This article explains the concept of per‑CPU variables in the Linux kernel, how they are defined with DEFINE_PER_CPU, linked into the .data..percpu section, initialized during boot, and accessed at runtime via the GS register and macro expansions such as this_cpu_read_stable.

Liangxu Linux
Liangxu Linux
Liangxu Linux
How Linux Implements Per‑CPU Variables: From Definition to Runtime Access

When writing multithreaded programs we often use thread‑local storage, where each thread has its own instance of a variable. Linux provides a similar mechanism inside the kernel called per‑CPU variables , where each CPU core gets its own copy of a variable.

Definition and Linking

Per‑CPU variables are declared in source files using the DEFINE_PER_CPU macro. The macro places the variable in a special ELF section named .data..percpu. During linking, the linker aggregates all such variables from every object file into a single .data..percpu section in the final vmlinux image.

The address of each per‑CPU variable is calculated as an offset from the start of this section; the offset is the variable’s position within .data..percpu. Because the offsets are known at compile time, the kernel can compute the runtime address for a specific CPU by adding the offset to the base address of that CPU’s per‑CPU memory block.

Boot‑time Initialization

When the kernel boots, it loads vmlinux into memory, determines the number of CPUs, and allocates a per‑CPU memory block for each core. The contents of the .data..percpu section are copied into each block’s static area. The base address of each block is then stored in the corresponding CPU’s GS segment register.

On x86_64 the GS register is set via a model‑specific register (MSR) write, allowing a 64‑bit base address.

Runtime Access

To read a per‑CPU variable, the kernel adds the variable’s offset to the address stored in GS. For example, the macro this_cpu_read_stable expands to an inline assembly statement that performs mov %gs:0x16d00, %rax, loading the value of current_task (the per‑CPU pointer to the currently running thread) into rax.

The current macro simply dereferences the current_task per‑CPU variable, giving the kernel quick access to the task structure of the thread currently executing on that CPU.

Macro Expansion Technique

Because kernel macros are heavily used, developers may want to see their full expansion. By compiling a single source file with the -save-temps=obj flag, GCC keeps intermediate files such as net/socket.i, which contains the pre‑processed source with all macros expanded. Opening this file and locating the desired function (e.g., get_current) reveals the exact assembly generated by the macros.

Alternatively, one can use a disassembler on the compiled object to verify the generated instructions, confirming that the macro ultimately produces a single mov from the per‑CPU address.

Static, Reserved, and Dynamic Per‑CPU Variables

The kernel distinguishes three kinds of per‑CPU variables:

Static : Defined at compile time and placed directly into the per‑CPU section; copied to each CPU’s block during boot.

Reserved : Used by loadable kernel modules; a separate region is allocated for each module’s static per‑CPU variables.

Dynamic : Allocated at runtime from a dynamic area within the per‑CPU block.

All three share the same underlying addressing mechanism, differing only in when and how their storage is allocated.

Key Symbols

The linker script vmlinux.lds.S defines symbols such as __per_cpu_start (always 0), __per_cpu_end (end of the per‑CPU section), and __per_cpu_load (runtime base address of the per‑CPU section). The size of the per‑CPU area can be computed as __per_cpu_end - __per_cpu_start, which is about 170 KiB in a typical kernel build.

Putting It All Together

During boot, the setup_per_cpu_areas function (found in arch/x86/kernel/percpu.c) allocates per‑CPU blocks, copies the static per‑CPU data, and writes the block base addresses into each CPU’s GS register. After this initialization, any code can retrieve a per‑CPU variable by adding its compile‑time offset to the GS base, enabling fast, lock‑free access to CPU‑local data such as current_task.

Understanding this mechanism helps kernel developers write efficient, concurrency‑safe code without resorting to heavy synchronization primitives.

Per‑CPU layout diagram
Per‑CPU layout diagram
Current macro definition
Current macro definition
Assembly generated by this_cpu_read_stable
Assembly generated by this_cpu_read_stable
Disassembly showing mov %gs:0x16d00, %rax
Disassembly showing mov %gs:0x16d00, %rax
setup_per_cpu_areas function
setup_per_cpu_areas function
Copying .data..percpu to per‑CPU blocks
Copying .data..percpu to per‑CPU blocks
Setting GS register via MSR
Setting GS register via MSR
Current_task update on context switch
Current_task update on context switch
Per‑CPU area size calculation
Per‑CPU area size calculation
Final GS register setup
Final GS register setup
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

kernelLinuxAssemblyMacro ExpansionThread Localper-CPU
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.