How Linux’s Buddy Allocator Powers Efficient Memory Management
This article explains the Linux kernel’s buddy memory allocation algorithm, covering its core principles, data structures, allocation and free workflows, performance advantages and drawbacks, monitoring tools, tuning parameters, and real‑world use cases such as Kubernetes, embedded IoT devices, and real‑time operating systems.
In the complex Linux kernel ecosystem, memory management is the "heart" of system stability; the buddy algorithm acts as a precise "memory steward" that dynamically splits and merges memory blocks to keep allocation efficient and fragmentation low.
1. Buddy Algorithm Overview
The algorithm mitigates external fragmentation by repeatedly halving larger free blocks until a block of the required size is obtained, and by merging free "buddy" blocks during reclamation.
2. Core Principles
2.1 What is a "buddy"?
A buddy pair consists of two blocks of equal size that are contiguous in physical memory and originate from the same larger block, similar to cutting a cake into equal pieces.
2.2 Data structures for block classification
The kernel uses an array indexed by order, where order represents block size as 2^order pages (e.g., order 0 = 1 page, order 10 = 1024 pages). Each entry holds a doubly‑linked free_list of blocks and a counter nr_free of available blocks.
2.3 Buddy management structures
#define MAX_ORDER 11
struct zone {
...
struct free_area free_area[MAX_ORDER];
...
}
struct free_area {
struct list_head free_list;
unsigned long nr_free;
};The kernel maintains a free_area array for each memory zone, allowing fast lookup of free blocks of a given order.
3. Allocation and Reclamation Workflow
3.1 Allocation process
When a request for, say, 256 pages arrives (order 8), the allocator first checks the corresponding free list. If empty, it searches higher orders (order 9, order 10). Upon finding a larger block, it splits it repeatedly until a block of the desired size is produced, inserting the remaining halves back into the appropriate free lists.
3.2 Reclamation process
When a block is freed, the allocator checks whether its buddy is also free. If so, the two are merged into a higher‑order block, and the process repeats recursively, reducing external fragmentation.
3.3 Initialization and release
During kernel boot,
start_kernel → mem_init → free_all_bootmem_node → __free_pageswalks every page and inserts it into the buddy system, establishing the initial free lists.
static inline void __free_one_page(struct page *page,
struct zone *zone, unsigned int order)
{
unsigned long page_idx;
int order_size = 1 << order;
int migratetype = get_pageblock_migratetype(page);
page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
__mod_zone_page_state(zone, NR_FREE_PAGES, order_size);
while (order < MAX_ORDER-1) {
struct page *buddy = __page_find_buddy(page, page_idx, order);
if (!page_is_buddy(page, buddy, order))
break;
list_del(&buddy->lru);
zone->free_area[order].nr_free--;
rmv_page_order(buddy);
unsigned long combined_idx = __find_combined_index(page_idx, order);
page = page + (combined_idx - page_idx);
page_idx = combined_idx;
order++;
}
set_page_order(page, order);
list_add(&page->lru, &zone->free_area[order].free_list[migratetype]);
zone->free_area[order].nr_free++;
}4. Advantages and Drawbacks
4.1 Advantages
Eliminates external fragmentation by keeping blocks in powers‑of‑two sizes.
Allocation and free are fast, using simple bit operations and linked‑list manipulations.
Strong adaptability to modern architectures, including NUMA systems where each node manages its own buddy lists.
4.2 Disadvantages
Internal fragmentation can be significant because allocations are rounded up to the nearest power of two.
Strict buddy‑pair requirements may leave free memory unusable for large contiguous requests, leading to external fragmentation in practice.
Splitting and merging involve many list and bitmap updates, adding overhead.
5. Observation and Optimization
5.1 System status
Reading /proc/buddyinfo shows the number of free blocks per order for each zone, helping to assess fragmentation levels.
Node 0, zone DMA 399 260 123 86 34 31 16 14 7 9 606
Node 0, zone Normal 1491 1005 233 23 53 28 12 25 9Low counts in high orders indicate severe fragmentation.
5.2 Kernel parameters
vm.max_order : controls the maximum block size (default 11 → 4 MiB). Raising it allows larger allocations at the cost of more metadata.
Migration type strategy : prioritising movable pages reduces fragmentation; adjusting /sys/devices/virtual/memory/zone*/pagesize can influence behaviour.
6. Application Scenarios
6.1 High‑concurrency servers (Kubernetes)
Containers frequently allocate and free memory; the buddy allocator, combined with cgroups, ensures fair distribution and prevents any single container from starving others.
#include <iostream>
#include <vector>
#include <unordered_map>
#include <algorithm>
#include <string>
#include <cmath>
#include <memory>
// Simplified buddy allocator and Kubernetes‑style manager (code omitted for brevity)6.2 Embedded IoT devices
On memory‑constrained boards (few MiB), the deterministic behaviour of the buddy system guarantees low latency and predictable power consumption.
#include <cstddef>
#include <cstdint>
#include <cstring>
#include <algorithm>
#include <new>
#include <stdexcept>
// Embedded buddy allocator implementation (code omitted for brevity)6.3 Real‑time operating systems
Deterministic allocation time makes the buddy algorithm suitable for safety‑critical control loops where millisecond‑level predictability is required.
7. Summary
The Linux kernel’s buddy memory allocator provides a fast, power‑of‑two based mechanism that balances allocation speed, fragmentation control, and architectural adaptability. Proper monitoring ( /proc/buddyinfo, vmstat, sar) and tuning of vm.max_order and migration policies can further optimise performance for workloads ranging from cloud‑native Kubernetes clusters to tiny embedded sensors.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
