Fundamentals 13 min read

Why malloc/free Struggle in High‑Concurrency: Inside ptmalloc’s Architecture and Performance

This article examines the inner workings of glibc’s ptmalloc, explains how malloc_state, malloc_chunk, fastbins and bins manage memory, and presents a multithreaded test that reveals lock contention and fragmentation problems that make the standard malloc/free unsuitable for high‑performance scenarios.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Why malloc/free Struggle in High‑Concurrency: Inside ptmalloc’s Architecture and Performance

Introduction

High‑performance programming often hinges on efficient memory allocation, yet the classic malloc and free functions may not meet the demands of multithreaded, high‑concurrency workloads.

ptmalloc Overview

In glibc, malloc and free are implemented by the ptmalloc module (source: glibc/malloc/malloc.c). ptmalloc manages a large memory pool using several key structures: malloc_state, malloc_chunk, top chunk, fastbins, and bins.

malloc_state

The arena state is represented by struct malloc_state, which contains:

fastbinsY : arrays of free‑list heads for 16‑160 byte chunks.

top : the “super‑chunk” that supplies memory when other bins are empty.

bins : three groups—unsortedbins, smallbins, largebins—holding free chunks of various sizes.

binmap : a bitmap for fast bin lookup.

next : pointer to the next arena (used for thread‑local arenas).

Each process starts with a main arena ( main_state) created via brk or mmap. When contention arises, additional thread‑local arenas ( thread_state) are created, each allocated exclusively through mmap. The number of arenas typically scales with the number of CPU cores.

malloc_chunk

Memory is allocated and freed in units called malloc_chunk:

struct malloc_chunk {
    INTERNAL_SIZE_T mchunk_prev_size; // size of previous chunk
    INTERNAL_SIZE_T mchunk_size;      // size of this chunk (low 3 bits are flags)
    struct malloc_chunk *fd;          // forward link in free list
    struct malloc_chunk *bk;          // backward link in free list
    struct malloc_chunk *fd_nextsize; // forward link for largebins
    struct malloc_chunk *bk_nextsize; // backward link for largebins
};

The chunk header ( mchunk_prev_size and mchunk_size) persists for the chunk’s lifetime and is used for coalescing and size identification. When a chunk is free, the first 16 bytes of its user area store the free‑list pointers.

fastbins and bins

Fastbins is a fixed‑size array of 10 singly‑linked lists, each handling chunks of 16 bytes increments up to 160 bytes. Bins is a 128‑element array split into:

unsortedbins : a cache for chunks moved from fastbins.

smallbins : free lists for 32‑1008 byte chunks.

largebins : free lists for chunks larger than 1024 bytes.

The top chunk supplies memory when no suitable free chunk exists; it is trimmed to satisfy the request, and the remainder stays as the new top chunk. If the top chunk runs out, ptmalloc expands it via brk or mmap.

malloc/free Flow

When malloc is called, ptmalloc first searches the appropriate free list; if none matches, it trims the top chunk. If the top chunk lacks sufficient space, the allocator grows the pool. free returns a chunk to the appropriate free list or the top chunk, and large frees may trigger munmap or brk to return memory to the kernel.

malloc flow diagram
malloc flow diagram

High‑Concurrency Issues

Two major problems appear under heavy multithreading:

Lock contention : malloc_state contains a mutex; every allocation/deallocation acquires this lock, degrading performance.

Memory fragmentation : Repeated trimming of the top chunk creates many small, long‑lived allocations that cannot be coalesced, leading to internal fragmentation and potential leaks.

Multithreaded Test

A C program spawns eight threads that repeatedly allocate random sizes (up to 1024 bytes). Allocations below a leak threshold (64 bytes) are intentionally not freed to simulate leaks. The program tracks total leaked memory and prints statistics.

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <pthread.h>
#include <stdatomic.h>

#define THREAD_NUM (8)
#define SIZE_THREHOLD (1024)
#define LEAK_THREHOLD (64)

atomic_int leak_size;

int get_size() {
    srand(time(0));
    return rand() % SIZE_THREHOLD;
}

void *do_malloc(void *arg) {
    while (1) {
        int size = get_size();
        char *p = malloc(size);
        if (size >= LEAK_THREHOLD) {
            free(p);
        } else {
            atomic_fetch_add(&leak_size, size);
            printf("leak size:%d KB
", leak_size / 1024);
        }
    }
    return NULL;
}

int main(int argc, char *argv[]) {
    atomic_init(&leak_size, 0);
    pthread_t th[THREAD_NUM];
    for (int i = 0; i < THREAD_NUM; i++) {
        pthread_create(&th[i], NULL, do_malloc, NULL);
    }
    do_malloc(NULL);
    for (int i = 0; i < THREAD_NUM; i++) {
        pthread_join(th[i], NULL);
    }
    return 0;
}

The test uses strace to monitor system calls:

Lock contention: strace -tt -f -e trace=futex ./a.out shows frequent futex calls.

Fragmentation: strace -tt -f -e trace=mprotect,brk ./a.out reveals repeated brk / mprotect expansions.

futex trace
futex trace

Test Results

Lock contention test confirmed that each allocation triggers a futex lock, confirming the mutex bottleneck.

Memory‑fragmentation test reported:

Initial leak: 1.8 MB, system free memory 2724 MB.

After running, total leaked memory reached 162 MB, leaving 2474 MB free.

Used memory: 2724 MB − 2474 MB = 250 MB; of this, 162 MB is leaked and 88 MB is fragmented.

memory usage chart
memory usage chart

Conclusion

Standard malloc / free are ill‑suited for highly concurrent workloads because of frequent lock acquisition.

Long‑running allocations cause significant fragmentation, leading to memory leaks and reduced usable memory.

For low‑contention scenarios the default allocator is acceptable, but high‑throughput services should adopt specialized memory‑pool allocators or custom ptmalloc tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceCmallocmemory allocationptmallocFree
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.