Fundamentals 13 min read

Understanding NUMA in Linux: Hardware Principles, ACPI Tables, and Kernel Initialization

This article explains the hardware basis of NUMA, how Linux reads ACPI SRAT and SLIT tables to discover CPU‑memory topology, the kernel functions that initialize NUMA structures, and how the memblock allocator incorporates this information to enable performance‑optimizing tools like numactl.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Understanding NUMA in Linux: Hardware Principles, ACPI Tables, and Kernel Initialization

1. Introduction

NUMA (Non‑Uniform Memory Access) is a hardware feature that can significantly affect program performance on multi‑CPU servers. By binding services to specific NUMA nodes with tools such as numactl , administrators can improve latency and throughput.

2. NUMA Overview

Modern CPUs contain integrated memory controllers, and each CPU can be connected to multiple DIMM slots. In a multi‑CPU server, CPUs are linked by high‑speed interconnects (e.g., UPI), and memory attached to a remote CPU incurs higher access latency, which is the essence of non‑uniform memory access.

Two ACPI tables describe NUMA topology:

SRAT (System Resource Affinity Table) – maps CPUs and memory regions to NUMA nodes.

SLIT (System Locality Information Table) – records the relative distances between nodes.

3. Linux Reading NUMA Information

3.1 Detecting Memory Affinity

During early boot the kernel reads the firmware via the ACPI interface. The ACPI specification (e.g., version 6.5) defines the SRAT and SLIT tables used for NUMA discovery.

Key kernel files involved:

#file:arch/x86/kernel/setup.c
void __init setup_arch(char **cmdline_p) {
    // Save physical memory detection results
    e820__memory_setup();
    // Initialise memblock allocator
    e820__memblock_setup();
    // Initialise memory subsystem, including NUMA
    initmem_init();
}

The initmem_init function calls x86_numa_init , numa_init , and finally acpi_numa_init , which parses the SRAT table and fills the global numa_meminfo structure.

//file:drivers/acpi/numa/srat.c
int __init acpi_numa_init(void) {
    // Parse SRAT table (CPU_AFFINITY, MEMORY_AFFINITY, etc.)
    if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
        ...
    }
    return 0;
}

The parsed data is stored as a list of triples (start address, end address, node id) in numa_meminfo .

3.2 Memblock Integration

After numa_meminfo is populated, the memblock allocator associates each memory region with its NUMA node.

//file:arch/x86/mm/numa.c
static int __init numa_register_memblks(struct numa_meminfo *mi) {
    for (i = 0; i < mi->nr_blks; i++) {
        struct numa_memblk *mb = &mi->blk[i];
        memblock_set_node(mb->start, mb->end - mb->start, &memblock.memory, mb->nid);
    }
    // Allocate pglist_data for each possible node
    for_each_node_mask(nid, node_possible_map) {
        alloc_node_data(nid);
    }
    memblock_dump_all();
    return 0;
}

When the kernel boots with the memblock=debug parameter, it prints detailed region information, now annotated with the node identifier (e.g., "on node 0").

4. Summary of the Memory Identification Process

1. The BIOS provides an initial memory map via the e820 interface.

2. The kernel creates the memblock allocator to manage this memory.

3. Using ACPI, the kernel reads SRAT (and SLIT) tables, fills numa_meminfo , and updates memblock regions with node IDs.

4. With NUMA topology known, tools like numactl can bind processes to specific nodes for better performance.

5. Final Remarks

NUMA awareness is crucial for modern servers because accessing memory attached to a remote CPU incurs higher latency. Linux obtains this topology from firmware via ACPI, integrates it into its memory management structures, and exposes it to user‑space tools. However, indiscriminate NUMA binding can cause allocation failures if global memory is not fully utilized, so careful profiling is recommended.

For deeper learning, the author offers a series of video lectures covering CPU‑memory hardware principles, kernel memory management, process management, networking, filesystems, containers, and performance monitoring.

Performance OptimizationMemory ManagementOperating SystemsLinux kernelNUMAACPI
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.