Linux Memory Allocation Mechanisms: malloc, kmalloc, vmalloc, mmap, and Slab
This article explains Linux's flexible memory allocation system, covering user‑space malloc, kernel‑space kmalloc, vmalloc for large buffers, mmap for file and anonymous mappings, page allocation functions, the slab allocator, and memory pools, with detailed code examples and operational insights.
Linux uses a flexible and efficient memory allocation system that combines user‑space libraries and kernel mechanisms to satisfy various program needs.
1. malloc
malloc is implemented in the C library; it first serves allocations from a libc cache and, when insufficient, requests more memory from the kernel via the brk system call, creating a new VMA.
Key functions:
SYSCALL_DEFINE1(brk, unsigned long, brk) {
unsigned long retval;
unsigned long newbrk, oldbrk, origbrk;
struct mm_struct *mm = current->mm;
struct vm_area_struct *next;
unsigned long min_brk;
bool populate;
bool downgraded = false;
LIST_HEAD(uf);
// ... (implementation details) ...
return brk;
}_do_sys_brk() checks existing VMA overlap, allocates a new VMA if needed, and handles mlockall() to lock pages.
Searches for a usable VMA at the old brk boundary; if overlapping, reuses it.
If no overlap, allocates a new VMA.
mlockall() forces immediate physical page allocation; otherwise pages are allocated on demand.
1.2 do_brk_flags
Finds a suitable linear address, selects the proper red‑black‑tree node, merges with existing VMA when possible, and inserts the new VMA into the mmap list and tree.
static int do_brk_flags(unsigned long addr, unsigned long len, unsigned long flags, struct list_head *uf) {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
struct rb_node **rb_link, *rb_parent;
pgoff_t pgoff = addr >> PAGE_SHIFT;
int error;
unsigned long mapped_addr;
// ... (implementation details) ...
return 0;
}1.3 mm_populate and get_user_pages
mm_populate() walks through __mm_populate() → populate_vma_page_range() → __get_user_pages() to allocate physical pages when VM_LOCKED is set.
static long __get_user_pages(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked) {
long ret = 0, i = 0;
struct vm_area_struct *vma = NULL;
struct follow_page_context ctx = { NULL };
// ... (implementation details) ...
return i ? i : ret;
}follow_page_pte() handles page‑table lookups, fault handling, and optional page locking.
static struct page *follow_page_pte(struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd, unsigned int flags,
struct dev_pagemap **pgmap) {
struct mm_struct *mm = vma->vm_mm;
struct page *page;
spinlock_t *ptl;
pte_t *ptep, pte;
// ... (implementation details) ...
return page;
}2. kmalloc
Kernel‑space allocation for objects smaller than a page, implemented via the slab allocator. It returns a virtual address and can allocate from 32 B up to 128 KB.
static inline void *kmalloc(size_t size, gfp_t flags) {
if (__builtin_constant_p(size)) {
if (size > KMALLOC_MAX_CACHE_SIZE)
return kmalloc_large(size, flags);
unsigned int index = kmalloc_index(size);
if (!index)
return ZERO_SIZE_PTR;
return kmem_cache_alloc_trace(kmalloc_caches[kmalloc_type(flags)][index], flags, size);
}
return __kmalloc(size, flags);
}kmem_cache_alloc_trace() records allocation traces and integrates with KASAN.
void *kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size) {
void *ret;
ret = slab_alloc(cachep, flags, size, _RET_IP_);
ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(_RET_IP_, ret, size, cachep->size, flags);
return ret;
}3. vmalloc
Used for large, virtually contiguous but physically non‑contiguous buffers. It creates a VMA, allocates physical pages, and maps them, which is more expensive than kmalloc.
void *vmalloc(unsigned long size);
void vfree(const void *addr);4. mmap
Provides user‑space memory mapping for files, anonymous memory, and shared regions. The implementation merges with existing VMAs when possible and handles various flags (MAP_PRIVATE, MAP_SHARED, MAP_ANONYMOUS).
unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
struct list_head *uf) {
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev, *merge;
int error;
struct rb_node **rb_link, *rb_parent;
unsigned long charged = 0;
// ... (implementation details) ...
return addr;
}Two common questions: repeated mmap of the same address succeeds because the old VMA is unmapped first; video playback can stall because mmap only creates VMA and actual page faults trigger disk reads.
5. Page allocation functions
alloc_page()/alloc_pages() allocate one or more contiguous pages; __get_free_pages() returns the first page’s address; corresponding free functions release the pages.
#define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0)
#define alloc_pages(gfp_mask, order) alloc_pages_node(numa_node_id(), gfp_mask, order)
static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) {
if (unlikely(order >= MAX_ORDER))
return NULL;
if (nid < 0)
nid = numa_node_id();
return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}6. Slab cache
The slab allocator provides object‑caches for frequently allocated structures. Functions include kmem_cache_create(), kmem_cache_alloc(), kmem_cache_free(), and kmem_cache_destroy().
struct kmem_cache *kmem_cache_create(const char *name, size_t size,
size_t align, unsigned long flags,
void (*ctor)(void *, struct kmem_cache *, unsigned long),
void (*dtor)(void *, struct kmem_cache *, unsigned long)); void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags);
void kmem_cache_free(struct kmem_cache *cachep, void *objp);
int kmem_cache_destroy(struct kmem_cache *cachep);7. Memory pools
mempool_create() builds a pool with a minimum number of pre‑allocated objects; mempool_alloc() and mempool_free() allocate and release objects, falling back to the underlying allocator when the pool is exhausted.
mempool_t *mempool_create(int min_nr, mempool_alloc_t *alloc_fn,
mempool_free_t *free_fn, void *pool_data);
void *mempool_alloc(mempool_t *pool, int gfp_mask);
void mempool_free(void *element, mempool_t *pool);
void mempool_destroy(mempool_t *pool);The article also includes practical tips, such as using madvise() for sequential reads and adjusting the kernel’s read‑ahead size with blockdev --setra to improve streaming performance.
Deepin Linux
Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.