Fundamentals 30 min read

Understanding Linux Memory Allocation: Fast Path vs. Slow Path in the Source Code

This article dissects the Linux kernel's page allocation mechanisms, explaining how alloc_pages() follows a fast‑path using low watermarks and falls back to a slow‑path that triggers kswapd, direct reclaim, and compaction, while also detailing the corresponding page‑freeing functions and their internal data structures.

Linux Kernel Journey

Sep 12, 2024

Understanding Linux Memory Allocation: Fast Path vs. Slow Path in the Source Code

Overview

The Linux kernel allocates physical pages through a set of interfaces such as alloc_pages() and __get_free_pages(). When memory is abundant, allocation follows a fast path; when memory is scarce, the kernel switches to a slow path that may invoke reclamation and compaction.

Prerequisite Knowledge

Allocation Interfaces

The core allocation functions are alloc_pages(gfp_mask, order) and __get_free_pages(gfp_mask, order). alloc_page(gfp_mask) and __get_free_page(gfp_mask) are wrappers that request a single page (order 0). get_zeroed_page(gfp_mask) adds the __GFP_ZERO flag to obtain a zero‑filled page.

Freeing Interfaces

Pages are released via __free_pages(), __free_page(), free_pages(), and free_page(). The caller must provide the correct page pointer and order value, otherwise a kernel panic may occur.

GFP Allocation Flags

The gfp_mask describes the allocation method. Flags are grouped into memory‑zone modifiers (e.g., __GFP_DMA, __GFP_HIGHMEM), mobility modifiers (e.g., __GFP_RECLAIMABLE), water‑level modifiers (e.g., __GFP_HIGH, __GFP_ATOMIC), page‑reclaim modifiers (e.g., __GFP_IO, __GFP_DIRECT_RECLAIM), and behavior modifiers (e.g., __GFP_COLD, __GFP_ZERO). Common flag combinations such as GFP_KERNEL, GFP_ATOMIC, and GFP_NOIO are listed in the article.

Fast‑Path Allocation

alloc_pages() Call Chain

#define alloc_pages(gfp_mask, order) \
  alloc_pages_node(numa_node_id(), gfp_mask, order)
static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
      unsigned int order)
{ if (nid == NUMA_NO_NODE)
    nid = numa_mem_id();
  return __alloc_pages_node(nid, gfp_mask, order);
}
static inline struct page *__alloc_pages_node(int nid, gfp_t gfp_mask,
      unsigned int order)
{ return __alloc_pages(gfp_mask, order, nid); }
static inline struct page *__alloc_pages(gfp_t gfp_mask, unsigned int order,
      int preferred_nid)
{ return __alloc_pages_nodemask(gfp_mask, order, preferred_nid, NULL); }

__alloc_pages_nodemask()

is the core of the buddy allocator. It prepares an alloc_context, selects the preferred zone, applies fragmentation‑avoidance flags, and attempts to obtain a page from the free list.

Key Functions

prepare_alloc_pages()

fills the alloc_context with zone lists, node masks, and migration types. get_page_from_freelist() walks the zonelist, checks watermarks with zone_watermark_fast(), and calls rmqueue() to pull a page from the buddy system. zone_watermark_fast() tests whether a zone has enough free pages above the low watermark for the requested order.

If the fast path succeeds, the function returns the first struct page of the allocated block.

Slow‑Path Allocation

__alloc_pages_slowpath()

static inline struct page *__alloc_pages_slowpath(gfp_t gfp_mask,
      unsigned int order, struct alloc_context *ac)
{
  bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
  const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
  struct page *page = NULL;
  unsigned int alloc_flags;
  // ... (omitted for brevity) ...
  /* Fast‑path attempt */
  page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
  if (page)
    goto got_pg;

  /* Direct reclaim */
  if (can_direct_reclaim && (costly_order ||
      (order > 0 && ac->migratetype != MIGRATE_MOVABLE)) &&
      !gfp_pfmemalloc_allowed(gfp_mask)) {
    page = __alloc_pages_direct_compact(gfp_mask, order,
        alloc_flags, ac, INIT_COMPACT_PRIORITY, &compact_result);
    if (page)
      goto got_pg;
  }

  /* Retry loop */
retry:
  if (alloc_flags & ALLOC_KSWAPD)
    wake_all_kswapds(order, gfp_mask, ac);
  page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
  if (page)
    goto got_pg;

  if (!can_direct_reclaim)
    goto nopage;

  page = __alloc_pages_direct_reclaim(gfp_mask, order, alloc_flags, ac,
        &did_some_progress);
  if (page)
    goto got_pg;

  page = __alloc_pages_direct_compact(gfp_mask, order, alloc_flags, ac,
        compact_priority, &compact_result);
  if (page)
    goto got_pg;

  if (gfp_mask & __GFP_NORETRY)
    goto nopage;

  if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
        did_some_progress > 0, &no_progress_loops))
    goto retry;

  if (should_compact_retry(ac, order, alloc_flags,
        compact_result, &compact_priority, &compaction_retries))
    goto retry;

  page = __alloc_pages_may_oom(gfp_mask, order, ac, &did_some_progress);
  if (page)
    goto got_pg;

nopage:
  if (gfp_mask & __GFP_NOFAIL) {
    page = __alloc_pages_cpuset_fallback(gfp_mask, order, ALLOC_HARDER, ac);
    if (page)
      goto got_pg;
    cond_resched();
    goto retry;
  }
  warn_alloc(gfp_mask, ac->nodemask, "page allocation failure: order:%u", order);
got_pg:
  return page;
}

The slow path first wakes kswapd, then retries the fast path with a lower watermark. If that fails and direct reclaim is allowed, it attempts __alloc_pages_direct_reclaim(). If still unsuccessful, it invokes the compactors via __alloc_pages_direct_compact(). When all attempts are exhausted, the allocator may trigger the OOM killer ( __alloc_pages_may_oom()) or, if __GFP_NOFAIL is set, loop indefinitely until memory becomes available.

Page Release

Freeing Functions

#define __free_page(page) __free_pages((page), 0)
#define free_page(addr) free_pages((addr), 0)
void free_pages(unsigned long addr, unsigned int order)
{
  if (addr != 0) {
    VM_BUG_ON(!virt_addr_valid((void *)addr));
    __free_pages(virt_to_page((void *)addr), order);
  }
}
void __free_pages(struct page *page, unsigned int order)
{
  if (put_page_testzero(page))
    free_the_page(page, order);
}
static inline void free_the_page(struct page *page, unsigned int order)
{
  if (order == 0)
    free_unref_page(page);
  else
    __free_pages_ok(page, order);
}

For a single page, free_unref_page() returns the page to the per‑CPU page cache after validation. For multiple pages, __free_pages_ok() prepares the block, determines the migration type, and calls free_one_page(), which ultimately invokes __free_one_page() to merge the freed block with its buddy and place it back onto the appropriate free list.

Conclusion

The article provides a line‑by‑line walkthrough of the Linux buddy allocator, showing how the fast path relies on low‑watermark checks and per‑CPU caches, while the slow path orchestrates reclamation, compaction, and OOM handling to satisfy allocation requests under memory pressure. It also clarifies the symmetry between allocation and freeing paths, emphasizing the importance of correct order and gfp_mask usage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

compaction Linux kernel memory allocation buddy allocator page reclamation fast path slow path

Written by

Linux Kernel Journey

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.