Buddy Algorithm: Principles, Implementation, and Use Cases in Linux Kernel Memory Management
The Buddy Algorithm is a memory‑management technique used by the Linux kernel that splits physical memory into power‑of‑two blocks, tracks them with a binary‑tree structure, and merges free buddies to reduce external fragmentation while providing efficient allocation and deallocation.
The Buddy Algorithm is a widely used memory‑allocation and reclamation method in operating systems, especially within the Linux kernel, designed to mitigate fragmentation and improve memory utilization by dividing available physical memory into blocks whose sizes are powers of two and maintaining a binary‑tree structure to track them.
1. Introduction
In Linux, the speed of memory allocation and release directly impacts system performance. Frequent requests for varying sizes of contiguous page frames cause external fragmentation, wasting space. The Buddy system dynamically splits larger free blocks into smaller ones until the required size is obtained, and merges adjacent free blocks during reclamation.
2. Buddy Algorithm Principle
When a request for a specific size arrives, the algorithm starts at the root of the binary tree, finds the smallest suitable block, marks it as allocated, and, if the block is larger than needed, splits it into two equal halves—one allocated, the other returned to the free list. If no block of the exact size exists, the search moves up to larger orders until a suitable block is found.
Upon freeing, the algorithm checks whether the neighboring block (the "buddy") is also free; if so, it merges them into a larger block, reducing fragmentation.
3. Data Structures
#define MAX_ORDER 11
struct zone {
...
struct free_area free_area[MAX_ORDER];
...
}
struct free_area {
struct list_head free_list;
unsigned long nr_free; // number of free blocks in this order
};The free_area array in a zone holds free lists for each order (2^k pages). Each list node represents a free block, and the block’s first page descriptor stores the order in its private field.
4. System Initialization
During kernel boot, memory is first managed by bootmem . The mem_init sequence calls __free_all_bootmem() , which inserts the free bootmem bitmap entries into the Buddy system’s free lists, thereby initializing the allocator.
mem_init → __free_all_bootmem() → free_all_bootmem() → free_all_bootmem_core(NODE_DATA(0)) → free_all_bootmem_core(pgdat)
// Example of freeing bootmem pages and inserting them into the Buddy free list
free_all_bootmem
return free_all_bootmem_core(NODE_DATA(0));
bootmem_data_t *bdata = pgdat->bdata;
page = virt_to_page(phys_to_virt(bdata->node_boot_start));
idx = bdata->node_low_pfn - (bdata->node_boot_start >> PAGE_SHIFT);
map = bdata->node_bootmem_map;
...5. Allocation Process
page = __rmqueue(zone, order);
for (current_order = order; current_order < MAX_ORDER; ++current_order) {
area = zone->free_area + current_order;
if (list_empty(&area->free_list)) continue;
page = list_entry(area->free_list.next, struct page, lru);
list_del(&page->lru);
rmv_page_order(page);
area->nr_free--;
expand(zone, page, order, current_order, area);
return page;
}
return NULL;The allocator scans free lists from the requested order upward, extracts the first suitable block, removes it from the list, and, if the block is larger than needed, calls expand() to split the remainder back into appropriate order lists.
6. Freeing Process
static inline void __free_one_page(struct page *page, struct zone *zone, unsigned int order) {
unsigned long page_idx;
int order_size = 1 << order;
int migratetype = get_pageblock_migratetype(page);
page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
__mod_zone_page_state(zone, NR_FREE_PAGES, order_size);
while (order < MAX_ORDER-1) {
struct page *buddy = __page_find_buddy(page, page_idx, order);
if (!page_is_buddy(page, buddy, order)) break;
list_del(&buddy->lru);
zone->free_area[order].nr_free--;
rmv_page_order(buddy);
page = merge_pages(page, buddy, order);
page_idx = merged_index(page_idx, order);
order++;
}
set_page_order(page, order);
list_add(&page->lru, &zone->free_area[order].free_list[migratetype]);
zone->free_area[order].nr_free++;
}When a block is freed, the algorithm repeatedly checks the buddy block; if it is free and of the same order, the two are merged into a higher‑order block, continuing until no further merge is possible.
7. Advantages and Disadvantages
Advantages: allocation and merging are fast due to power‑of‑two sizing; internal fragmentation is limited; the algorithm is simple and well‑suited for kernel page‑frame management.
Disadvantages: if the requested size is not a power of two, internal fragmentation can be significant; the overhead of maintaining multiple free lists and bitmaps can be non‑trivial; a single small allocated block can prevent larger blocks from coalescing.
8. Typical Scenarios
The Buddy system is employed for physical memory management, page‑frame allocation, and buffer cache handling in Linux, effectively solving external fragmentation problems.
References to further reading and related articles are provided at the end of the original document.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.