Understanding Linux Kernel Memory: Nodes, Zones, Buddy System, and SLAB Allocator
This article explains how Linux 3.10 organizes memory using NUMA nodes, zones, the buddy system, and the SLAB allocator, providing commands, code examples, and visual diagrams to illustrate each layer of the kernel's efficient memory management.
1. Node Division (NUMA)
Modern servers use NUMA architecture where each CPU socket and its directly attached memory form a node . The dmidecode command can list CPU details and memory modules, showing which DIMM belongs to which CPU. Example output:
Processor Information //第一颗CPU
SocketDesignation: CPU1
Version: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Core Count: 8
Thread Count: 16
Processor Information //第二颗CPU
Socket Designation: CPU2
Version: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Core Count: 8Memory modules can be inspected similarly, revealing four DIMMs per CPU on the example machine. The numactl --hardware command displays each node's CPUs and memory size:
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65419 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB2. Zone Division
Each node is further split into zones , which are contiguous memory ranges. Common zones on x86‑64 are:
ZONE_DMA – low‑address range for ISA DMA devices.
ZONE_DMA32 – for 32‑bit DMA devices, only present on 64‑bit kernels.
ZONE_NORMAL – all remaining memory managed by the kernel.
ZONE_HIGHMEM exists only on 32‑bit systems and is rarely used today.
A zone contains many pages , each typically 4 KB. The /proc/zoneinfo file shows per‑zone page statistics:
# cat /proc/zoneinfo
Node 0, zone DMA
pages free 3973
managed 3973
Node 0, zone DMA32
pages free 390390
managed 427659
Node 0, zone Normal
pages free 15021616
managed 15990165
Node 1, zone Normal
pages free 16012823
managed 16514393Multiplying the number of free pages by 4 KB yields the zone size (e.g., Node 1 Normal zone ≈ 66 GB).
3. Buddy System for Free Page Management
The kernel represents each zone with a struct zone. Its free_area array (size MAX_ORDER = 11) holds free page lists for block sizes 4 KB, 8 KB, …, 4 MB.
//file: include/linux/mmzone.h
#define MAX_ORDER 11
struct zone {
free_area free_area[MAX_ORDER];
...
}The alloc_pages(gfp_mask, order) function searches these lists to allocate a contiguous block. For example, allocating an 8 KB block (order = 1) involves finding two adjacent free pages.
struct page * alloc_pages(gfp_t gfp_mask, unsigned int order)In the buddy system, a "buddy" is a pair of equal‑size, contiguous blocks that belong to the same larger region.
4. SLAB Allocator
While the buddy system works with whole pages, many kernel objects are much smaller. The SLAB (or SLUB) allocator sits on top of the buddy system and manages caches of objects of a fixed size.
Each kmem_cache has three linked lists: partial , full , and free . A slab consists of one or more pages and stores objects of the same size.
//file: include/linux/slab_def.h
struct kmem_cache {
struct kmem_cache_node **node
...
}
//file: mm/slab.h
struct kmem_cache_node {
struct list_head slabs_partial;
struct list_head slabs_full;
struct list_head slabs_free;
...
}When a cache needs more memory, it calls kmem_getpages, which ultimately invokes alloc_pages_exact_node (a wrapper around __alloc_pages) to obtain whole pages from the buddy system.
//file: mm/slab.c
static void *kmem_getpages(struct kmem_cache *cachep,
gfp_t flags, int nodeid)
{
...
flags |= cachep->allocflags;
if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
flags |= __GFP_RECLAIMABLE;
page = alloc_pages_exact_node(nodeid, ...);
...
}
//file: include/linux/gfp.h
static inline struct page *alloc_pages_exact_node(int nid,
gfp_t gfp_mask, unsigned int order)
{
return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}Typical kernel caches (e.g., TCP socket structures) are visible via /proc/slabinfo or the slabtop command. The output includes objsize (object size) and objperslab (objects per slab), plus pagesperslab to compute memory consumption.
# cat /proc/slabinfo | grep TCP
TCP 288 384 1984 16 8Interpretation: each TCP slab occupies 8 pages (8 × 4 KB = 32 KB); each object is 1984 bytes; a slab holds 16 objects (1984 × 16 ≈ 31.7 KB), leaving about 1 KB unused, which is acceptable given the low fragmentation and high performance of the SLAB mechanism.
5. Summary
The Linux kernel combines several layers—NUMA nodes, zones, the buddy system, and the SLAB allocator—to manage memory efficiently. Nodes and zones provide a hierarchical view of physical memory, the buddy system handles page‑level allocation, and the SLAB allocator reduces fragmentation for small kernel objects, delivering high performance for both user‑space and kernel‑space allocations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
