Fundamentals 12 min read

Why malloc/free Struggle in High‑Concurrency: Inside ptmalloc’s Architecture

This article examines the internal design of glibc’s ptmalloc, explains how malloc_state, malloc_chunk, fastbins, bins and the top chunk work, analyzes lock contention and memory fragmentation in multithreaded scenarios, and presents a C benchmark that measures these performance issues.

Liangxu Linux
Liangxu Linux
Liangxu Linux
Why malloc/free Struggle in High‑Concurrency: Inside ptmalloc’s Architecture

Introduction

The article investigates whether the standard malloc and free functions can meet the demands of high‑performance, multithreaded programs, focusing on the glibc ptmalloc implementation located in glibc/malloc/malloc.c.

ptmalloc Architecture

malloc_state

ptmalloc manages memory through a struct malloc_state that contains a mutex, fastbins, the top chunk, bins, a bin map, and a linked list pointer ( next). Each process may have multiple malloc_state instances: a main allocation arena (created at startup) and additional thread‑local arenas ( thread_state) created on demand to reduce lock contention.

malloc_chunk

Memory is allocated in units called malloc_chunk. Each chunk stores its previous size ( mchunk_prev_size) and current size ( mchunk_size) in the header, followed by user data. When a chunk is free, the first 16 bytes of the user area are repurposed as forward ( fd) and backward ( bk) pointers for linked‑list management.

Fastbins and Bins

Fastbins (10 entries) hold free chunks of 16‑160 bytes, indexed by size steps of 16 bytes. Bins (128 entries) are divided into unsortedbins, smallbins (32‑1008 bytes) and largebins (>1024 bytes). Unsortedbins act as a cache for chunks merged from fastbins, helping to mitigate fragmentation.

Top Chunk

The top chunk represents the remaining unallocated memory in the arena. When no suitable free chunk exists, the allocator trims the top chunk to satisfy the request, expanding it via brk or mmap if necessary.

malloc/free Flow

On a malloc call, the allocator first searches the appropriate free list; if none is found, it slices the top chunk. If the top chunk lacks space, the allocator grows the arena. The free operation returns the chunk to the appropriate free list, possibly merging it back into the top chunk; large releases may trigger munmap or brk to return memory to the kernel.

High‑Concurrency Issues

Lock contention : malloc_state contains a mutex; frequent allocations in many threads cause repeated locking, degrading performance.

Memory fragmentation : Continuous splitting of large mmap/brk regions into small chunks can leave unreclaimed gaps, leading to internal fragmentation and, in extreme cases, memory leaks.

Benchmark Code

#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <pthread.h>
#include <stdatomic.h>

#define THREAD_NUM (8) // test threads
#define SIZE_THREHOLD (1024) // max allocation size
#define LEAK_THREHOLD (64) // leak threshold

atomic_int leak_size; // leak counter

int get_size() {
    srand(time(0));
    return rand() % SIZE_THREHOLD;
}

void *do_malloc(void *arg) {
    while (1) {
        int size = get_size();
        char *p = malloc(size);
        if (size >= LEAK_THREHOLD) {
            free(p);
        } else {
            atomic_fetch_add(&leak_size, size);
            printf("leak size:%d KB
", leak_size / 1024);
        }
    }
    return NULL;
}

int main(int argc, char *argv[]) {
    atomic_init(&leak_size, 0);
    pthread_t th[THREAD_NUM];
    for (int i = 0; i < THREAD_NUM; i++) {
        pthread_create(&th[i], NULL, do_malloc, NULL);
    }
    do_malloc(NULL);
    for (int i = 0; i < THREAD_NUM; i++) {
        pthread_join(th[i], NULL);
    }
    return 0;
}

The benchmark spawns multiple threads that repeatedly allocate random-sized blocks. Allocations below LEAK_THREHOLD are intentionally leaked to simulate fragmentation, while larger allocations are freed.

Test Methodology

Lock‑contention test: run strace -tt -f -e trace=futex ./a.out and observe frequent futex system calls.

Fragmentation test: run strace -tt -f -e trace=mprotect,brk ./a.out to detect arena expansions.

Metrics: total used memory, leaked memory, and derived fragmentation (used – leak).

Results

Lock‑contention test showed many futex calls, confirming that malloc / free acquire the arena mutex on each allocation.

Fragmentation test revealed:

Initial leak: 1.8 MB, system free memory 2724 MB.

Final leak: 162 MB, system free memory 2474 MB.

Thus, user‑consumed memory ≈ 250 MB, of which 162 MB is leaked and 88 MB is fragmented.

Strace also captured frequent mprotect and brk calls, indicating repeated arena growth.

Conclusion

malloc

and free are unsuitable for high‑concurrency workloads due to mutex‑induced lock contention.

Long‑running programs that retain many small allocations suffer from severe fragmentation.

For low‑contention scenarios, the standard allocator is acceptable; for high‑performance or long‑lived services, a custom memory pool or alternative allocator should be used.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CmallocMemory Fragmentationmemory allocationptmallocFree
Liangxu Linux
Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.