Fundamentals 22 min read

Understanding NUMA Node Detection and Memory Management in the Linux Kernel

This article explains the fundamentals of NUMA architecture, how Linux detects and represents NUMA nodes, the memory zone hierarchy, allocation policies, and practical techniques such as using numactl and taskset to bind processes for optimal performance on multi‑socket servers.

Deepin Linux

Dec 30, 2024

Understanding NUMA Node Detection and Memory Management in the Linux Kernel

1. What is NUMA

NUMA (Non‑Uniform Memory Access) overcomes the scalability limits of traditional SMP systems by combining SMP's ease of programming with MPP's scalability, allowing each CPU socket to have its own local memory and inter‑socket links such as QPI.

2. NUMA System Architecture

2.1 Evolution of Memory Management

Early SMP systems suffered from bus contention as CPU counts grew, leading to performance bottlenecks. NUMA introduces separate memory controllers per socket, reducing contention and providing faster local memory access.

2.2 Linux Kernel’s NUMA Representation

In the Linux kernel each NUMA node is described by the pglist_data (formerly pg_data_t) structure, which contains an array of struct zone objects such as ZONE_DMA, ZONE_DMA32, and ZONE_NORMAL. These zones partition memory for different hardware requirements.

ZONE_DMA covers the low 16 MiB for legacy DMA devices, ZONE_DMA32 serves devices that can address up to 4 GiB, and ZONE_NORMAL holds the bulk of usable memory for the kernel.

3. NUMA Core Techniques

3.1 Memory Allocation Policy

When allocating memory, the kernel first consults the node distance matrix. Under the ZONELIST_FALLBACK policy, it prefers the local node’s highest‑priority zone (normally ZONE_NORMAL) and falls back to farther nodes only if necessary.

Special flags such as __GFP_THISNODE enforce the ZONELIST_NOFALLBACK policy, restricting allocation to the current node.

4. Discovering NUMA Nodes

4.1 Useful Commands

The numactl --hardware command displays the number of nodes, CPUs per node, memory size, and inter‑node distances. The /sys/devices/system/node directory contains per‑node files like cpulist and meminfo that provide real‑time topology and usage data.

4.2 Kernel Code Insight

Key kernel functions include numa_node_id(), which reads hardware registers to determine the current node, and the fields of struct pglist_data such as node_id, node_start_pfn, and node_spanned_pages that store node identifiers, start PFN, and size.

5. Practical NUMA Optimization

5.1 Binding Processes to Nodes

Using numactl --cpunodebind=0 --membind=0 <program> forces a workload to run on a specific node, ensuring both CPU execution and memory allocation are local. The taskset command can bind threads to particular CPU cores.

5.2 Performance Impact Example

A dual‑node server (2 × 8 CPU, 32 GiB RAM) running a multithreaded database query sees latency drop from ~200 ns to < 50 ns and bandwidth utilization rise from ~50 % to >80 % after binding the process to a single node with numactl, demonstrating the tangible benefits of NUMA‑aware scheduling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization system architecture Memory Management Linux kernel numa

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.