Fundamentals 27 min read

Understanding Linux Kernel Memory Pools and Their Implementation

This article explains the concept, benefits, and implementation details of memory pools in the Linux kernel, covering allocation functions, design principles, common strategies, and a thread‑safe C++ memory‑pool class with example code.

Deepin Linux

Aug 5, 2024

Understanding Linux Kernel Memory Pools and Their Implementation

1. Overview

1.1 Pooling Technique

Pooling is a design pattern that pre‑allocates frequently used core resources into a pool for self‑management, improving resource utilization and guaranteeing the amount of resources a program holds. Common pooling techniques include memory pools, thread pools, and connection pools, with memory and thread pools being the most widely used.

1.2 Memory Pool

A memory pool is a technique for managing memory allocation and release. It pre‑allocates a contiguous block of memory, divides it into fixed‑size chunks (or pages), and hands out a chunk when allocation is needed, marking it as used and returning it to the pool after use, thus avoiding frequent dynamic allocation from the OS.

Using a memory pool reduces performance overhead and fragmentation caused by frequent dynamic allocation and deallocation. It is especially suitable for scenarios that repeatedly allocate and free objects of the same size, such as network servers and database systems, thereby improving performance and scalability. Memory pools are commonly used in high‑performance server programs, embedded systems, and real‑time systems.

mempool_t

is a Linux‑kernel implementation of a memory pool defined in <linux/mempool.h>. It provides a simple and efficient way to manage pre‑allocated memory blocks within kernel modules, supporting allocation, reclamation, and various pool algorithms to reduce allocation cost, avoid fragmentation, and improve module performance and reliability.

In the Linux kernel, the following functions are used to create and operate memory pools:

(1) kmem_cache_create()

struct kmem_cache *kmem_cache_create(const char *name, size_t size, size_t align,
                                    unsigned long flags, void (*ctor)(void *));

Parameters:

name: name of the memory pool.

size: size of each object.

align: alignment (usually ARCH_KMALLOC_MINALIGN ).

flags: flag bits such as GFP_KERNEL , GFP_ATOMIC , etc.

ctor: constructor function pointer for initializing newly allocated objects.

(2) kmem_cache_alloc()

void *kmem_cache_alloc(struct kmem_cache *cache, gfp_t flags);

Parameters:

cache: pointer to the memory pool created by kmem_cache_create() .

flags: allocation flags.

(3) kmem_cache_free()

void kmem_cache_free(struct kmem_cache *cache, void *obj);

Parameters:

cache: pointer to the memory pool.

obj: pointer to the memory block to be freed.

(4) kmem_cache_init() int kmem_cache_init(void); (5) kmem_cache_destroy()

void kmem_cache_destroy(struct kmem_cache *cache);

2. Why a Memory Pool Is Needed

2.1 Memory Fragmentation

Fragmentation reduces heap utilization. Internal fragmentation occurs when an allocated block is larger than the payload (e.g., allocating 10 bytes but using only 5). External fragmentation happens when free memory is split into non‑contiguous blocks that cannot satisfy a larger allocation request.

2.2 Allocation Efficiency

Frequent small allocations are analogous to repeatedly asking parents for pocket money; the overhead of each request reduces overall efficiency. A memory pool provides pre‑allocated blocks, eliminating the need for repeated system calls.

2.3 Common Memory‑Pool Implementations

(1) Fixed‑size buffer pool : suitable for frequent allocation of objects of the same size.

(2) dlmalloc : a widely used allocator originally written by Doug Lea.

(3) SGI STL allocator : manages free lists for sizes from 8 bytes to 64 KB.

(4) Loki small‑object allocator : uses a vector to manage fixed‑size blocks with automatic growth.

(5) Boost object_pool : maintains a free‑node list and expands by doubling the number of nodes.

(6) ACE_Cached_Allocator and ACE_Free_List : ACE framework provides fixed‑size block management with free‑list and unbounded set structures.

(7) TCMalloc : Google’s high‑performance allocator from gperftools.

3. Memory‑Pool Design

3.1 Reasons to Use a Memory Pool

Performance : Pre‑allocating a contiguous memory region and dividing it into equal blocks allows constant‑time allocation without invoking the OS.

Reduced Fragmentation : Fixed‑size blocks avoid both internal and external fragmentation.

Simplified Management : A pool abstracts allocation and deallocation behind simple API calls.

Controlled Resource Consumption : Especially important for embedded or real‑time environments with strict resource limits.

3.2 Working Principle

Initialization: Allocate a large contiguous region (from the OS or reserved space).

Divide the region into equal‑size blocks.

Maintain a free‑list (often a linked list) of unused blocks.

Allocation: Remove a block from the free‑list and return it.

Release: Return the block to the free‑list.

Optional expansion: Allocate additional blocks when the pool runs out.

The main advantage is fast allocation and reduced fragmentation compared with dynamic allocation.

3.3 Evolution of Memory Pools

(1) Simple allocator : Uses a linked list of free blocks; easy to implement but inefficient for large pools.

(2) Fixed‑size allocator : Maintains separate free lists for each size class, offering O(1) allocation for those sizes.

(3) Hash‑mapped FreeList pool : Maps size classes to dedicated free lists, adding a header cookie to identify the allocator responsible for each block.

(4) Understanding malloc internals : Discusses advantages (free‑list arrays, reduced fragmentation) and disadvantages (metadata overhead, thread‑safety issues).

3.4 Memory‑Pool Frameworks

(1) Buddy system : Splits memory into power‑of‑two blocks; efficient for page‑level allocation but requires contiguous buddies for merging.

(2) Slab mechanism : Pre‑allocates caches of objects of the same size; avoids fragmentation and is the basis for tcmalloc and jemalloc.

(3) Coarse‑grained design : Allocates a whole page per connection or task; simple but may waste memory.

3.5 Example Implementations

kmem_cache based pool (kernel) :

#include <linux/slab.h>

struct my_struct {
    int data;
};

struct kmem_cache *my_cache;

void init_my_pool(void) {
    my_cache = kmem_cache_create("my_pool", sizeof(struct my_struct), 0, 0, NULL);
}

void destroy_my_pool(void) {
    kmem_cache_destroy(my_cache);
}

struct my_struct *alloc_from_my_pool(void) {
    return kmem_cache_alloc(my_cache, GFP_KERNEL);
}

void free_to_my_pool(struct my_struct *ptr) {
    kmem_cache_free(my_cache, ptr);
}

slab‑based pool (kernel) :

#include <linux/slab.h>

struct my_struct {
    int data;
};

struct kmem_cache *my_slab;

void init_my_pool(void) {
    my_slab = kmem_cache_create("my_pool", sizeof(struct my_struct), 0, SLAB_HWCACHE_ALIGN, NULL);
}

void destroy_my_pool(void) {
    kmem_cache_destroy(my_slab);
}

struct my_struct *alloc_from_my_pool(void) {
    return kmalloc(sizeof(struct my_struct), GFP_KERNEL);
}

void free_to_my_pool(struct my_struct *ptr) {
    kfree(ptr);
}

slub‑based pool (kernel) :

#include <linux/slab.h>

struct my_struct {
    int data;
};

struct kmem_cache *my_slub;

void init_my_pool(void) {
    my_slub = KMEM_CACHE(my_struct, SLUB_PANIC);
}

void destroy_my_pool(void) {
    kmem_cache_destroy(my_slub);
}

struct my_struct *alloc_from_my_pool(void) {
    return kmem_cache_alloc(my_slub, GFP_KERNEL);
}

void free_to_my_pool(struct my_struct *ptr) {
    kmem_cache_free(my_slub, ptr);
}

4. Concurrent Memory Pool (C++ Implementation)

The following C++11 class MemoryPool provides a thread‑safe memory pool with fixed‑size blocks, automatic growth, zero‑on‑deallocate option, and compatibility with std::allocator.

#ifndef PPX_BASE_MEMORY_POOL_H_
#define PPX_BASE_MEMORY_POOL_H_

#include <climits>
#include <cstddef>
#include <mutex>

namespace ppx {
    namespace base {
        template <typename T, size_t BlockSize = 4096, bool ZeroOnDeallocate = true>
        class MemoryPool {
        public:
            typedef T               value_type;
            typedef T*              pointer;
            typedef T&              reference;
            typedef const T*        const_pointer;
            typedef const T&        const_reference;
            typedef size_t          size_type;
            typedef ptrdiff_t       difference_type;
            typedef std::false_type propagate_on_container_copy_assignment;
            typedef std::true_type  propagate_on_container_move_assignment;
            typedef std::true_type  propagate_on_container_swap;

            template <typename U> struct rebind { typedef MemoryPool<U> other; };

            MemoryPool() noexcept;
            MemoryPool(const MemoryPool& memoryPool) noexcept;
            MemoryPool(MemoryPool&& memoryPool) noexcept;
            template <class U> MemoryPool(const MemoryPool<U>& memoryPool) noexcept;
            ~MemoryPool() noexcept;
            MemoryPool& operator=(const MemoryPool& memoryPool) = delete;
            MemoryPool& operator=(MemoryPool&& memoryPool) noexcept;
            pointer address(reference x) const noexcept;
            const_pointer address(const_reference x) const noexcept;
            pointer allocate(size_type n = 1, const_pointer hint = 0);
            void deallocate(pointer p, size_type n = 1);
            size_type max_size() const noexcept;
            template <class U, class... Args> void construct(U* p, Args&&... args);
            template <class U> void destroy(U* p);
            template <class... Args> pointer newElement(Args&&... args);
            void deleteElement(pointer p);
        private:
            struct Element_ { Element_* pre; Element_* next; };
            typedef char* data_pointer;
            typedef Element_ element_type;
            typedef Element_* element_pointer;
            element_pointer data_element_;
            element_pointer free_element_;
            std::recursive_mutex m_;
            size_type padPointer(data_pointer p, size_type align) const noexcept;
            void allocateBlock();
            static_assert(BlockSize >= 2 * sizeof(element_type), "BlockSize too small.");
        };
        // Implementation omitted for brevity (see source)
    }
}

#endif // PPX_BASE_MEMORY_POOL_H_

Example usage demonstrates allocation, deallocation, and multithreaded access using std::thread, as well as integration with user‑defined types such as Apple and primitive types.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Linux kernel allocation C++memory pool

Written by

Deepin Linux

Research areas: Windows & Linux platforms, C/C++ backend development, embedded systems and Linux kernel, etc.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.