Fundamentals 40 min read

Unlocking DPDK Memory Management: How Hugepages Boost Performance

This article consolidates DPDK 17.11 source‑code notes to explain the library’s memory‑management subsystem, covering hugepage concepts, shared configuration mapping, NUMA‑aware allocation, and the custom allocator that enables high‑throughput packet processing on Linux.

Open Source Linux
Open Source Linux
Open Source Linux
Unlocking DPDK Memory Management: How Hugepages Boost Performance

Overview

Memory management is a core component of the Data Plane Development Kit (DPDK); it underpins the performance of all other DPDK modules and user applications. Although DPDK also runs on FreeBSD and Windows, most memory‑related features are Linux‑only.

The memory hierarchy consists of three layers created during rte_eal_init and three layers that applications build via the API. Each layer offers a set of functions for upper layers or applications.

Key Data Structures

The shared memory structure struct rte_mem_config is stored in /var/run/.rte_config. Important fields include:

memseg : groups hugepages that share socket, size, and contiguous physical and virtual addresses.

malloc_heap : attaches memsegs of the same socket to a heap that implements the low‑level allocation API.

memzone : provides whole‑chunk allocations.

tailq_head : a shared queue allowing primary and secondary processes to access the same data.

Memory Initialization Flow

rte_eal_init
    eal_reset_internal_config(&internal_config);
    eal_parse_args(argc, argv);
    eal_hugepage_info_init();
    rte_config_init();
    rte_eal_memory_init();
    rte_eal_memzone_init();
    rte_eal_mcfg_complete();

The primary process creates and locks the shared configuration; secondary processes wait for the RTE_MAGIC flag before attaching.

Standard Hugepages

Modern CPUs manage memory in pages; the default 4 KB page size leads to frequent TLB misses when DPDK processes tens of gigabytes of data. DPDK therefore relies on standard hugepages (2 MB or 1 GB) to increase the amount of memory covered by each TLB entry, dramatically reducing miss rates and improving throughput.

eal_hugepage_info_init – Collecting Available Hugepages

int eal_hugepage_info_init(void) {
    DIR *dir = opendir(sys_dir_path);
    while ((dirent = readdir(dir)) != NULL) {
        if (strncmp(dirent->d_name, "hugepages-", 10) != 0)
            continue;
        // parse size, locate mount point, count pages, lock, etc.
    }
    closedir(dir);
    // sort by size, return 0 if at least one valid hugepage exists
    return (found) ? 0 : -1;
}

rte_config_init – Mapping the Shared Config

static void rte_config_init(void) {
    rte_config.process_type = internal_config.process_type;
    switch (rte_config.process_type) {
    case RTE_PROC_PRIMARY:
        rte_eal_config_create();
        break;
    case RTE_PROC_SECONDARY:
        rte_eal_config_attach();
        rte_eal_mcfg_wait_complete(rte_config.mem_config);
        rte_eal_config_reattach();
        break;
    default:
        rte_panic("Invalid process type
");
    }
}

Primary vs. Secondary Memory Mapping

The primary process maps the hugepage files and records the virtual address in mem_cfg_addr. Secondary processes open /var/run/.rte_config, wait for the magic flag, then remap the shared region at the exact address recorded by the primary process, ensuring both processes share identical virtual addresses.

int rte_eal_hugepage_attach(void) {
    const struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
    // map each memseg at the address stored in the primary process
    // abort if ASLR prevents the mapping
    return 0;
}

NUMA‑Aware Allocation and DMA

DPDK explicitly binds memory to NUMA nodes to avoid cross‑node latency. It locks memory pages, obtains physical addresses, and, when IOMMU is enabled, uses IOVA (IO virtual addresses) that may differ from real physical addresses but are transparent to the hardware.

Custom Allocator and Memory Pools

DPDK does not use malloc(). Instead it creates a heap from hugepages and provides allocation functions such as malloc_heap_alloc, rte_memzone_reserve, and rte_mempool_create. Objects like rte_mbuf are allocated from mempools, which are themselves built from one or more memzones.

void *malloc_heap_alloc(struct malloc_heap *heap, size_t size, unsigned flags, size_t align, size_t bound) {
    size = RTE_CACHE_LINE_ROUNDUP(size);
    align = RTE_CACHE_LINE_ROUNDUP(align);
    rte_spinlock_lock(&heap->lock);
    // find suitable element, split, update counters
    rte_spinlock_unlock(&heap->lock);
    return elem ? (void *)&elem[1] : NULL;
}
const struct rte_memzone *rte_memzone_reserve(const char *name, size_t len, int socket_id, unsigned flags) {
    return rte_memzone_reserve_thread_safe(name, len, socket_id, flags, RTE_CACHE_LINE_SIZE, 0);
}
struct rte_mempool *rte_pktmbuf_pool_create(const char *name, unsigned n, unsigned cache_size,
    uint16_t priv_size, uint16_t data_room_size, int socket_id) {
    // allocate mempool, set ops, populate with mbuf objects
    return mp;
}

Each rte_mbuf contains metadata, a private area, and a data buffer; allocation simply pulls a pre‑initialized object from the pool, making packet I/O extremely fast.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Memory ManagementDMADPDKNUMAhugepages
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.