Fundamentals 26 min read

Unveiling glibc malloc: How Linux Allocates Heap Memory and Optimizes Performance

This article explores the inner workings of Linux's glibc malloc, detailing how heap memory is requested from the kernel, how multithreaded support is implemented with per‑thread arenas, and the structures such as arenas, bins, and chunks that together determine allocation efficiency.

Open Source Linux
Open Source Linux
Open Source Linux
Unveiling glibc malloc: How Linux Allocates Heap Memory and Optimizes Performance

Preface

Heap memory is an interesting area; many wonder how to request heap memory from the kernel, who manages it, why memory management is so efficient, and whether its efficiency can be further improved.

Open‑source communities provide many allocators such as dlmalloc, ptmalloc2, jemalloc, tcmalloc, libumem, etc. Each claims to be fast, scalable, and memory‑efficient, but not all suit every application. Memory‑hungry programs depend heavily on allocator performance.

This article focuses on the glibc malloc allocator, using the latest source code for illustration.

History : ptmalloc2 was built on dlmalloc, added multithread support, and was released in 2006. It was later merged into glibc, so ptmalloc2 and glibc malloc share much of the same code base.

1. System Calls for Heap Allocation

malloc internally uses the brk or mmap system calls to request heap space from the kernel.

In memory management, "heap" refers to the virtual address space used for dynamic allocation, while "stack" refers to static allocation. The heap resides in the region obtained via brk (the "Heap") and the region obtained via mmap (the "Memory Mapping Segment"). In glibc, the heap is used for small allocations and for the main thread.

2. Multithread Support

Early Linux used dlmalloc, but ptmalloc2 added multithread support, so Linux switched to ptmalloc2. In dlmalloc, all threads share a single free‑list, causing contention when multiple threads call malloc. In ptmalloc2, each thread has its own arena with an independent free‑list, allowing concurrent allocations.

2.1. Example Code

/* Per thread arena example. */
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

void* threadFunc(void* arg) {
    printf("Before malloc in thread 1
");
    getchar();
    char* addr = (char*) malloc(1000);
    printf("After malloc and before free in thread 1
");
    getchar();
    free(addr);
    printf("After free in thread 1
");
    getchar();
}

int main() {
    pthread_t t1;
    void* s;
    int ret;
    char* addr;

    printf("Welcome to per thread arena example::%d
",getpid());
    printf("Before malloc in main thread
");
    getchar();
    addr = (char*) malloc(1000);
    printf("After malloc and before free in main thread
");
    getchar();
    free(addr);
    printf("After free in main thread
");
    getchar();
    ret = pthread_create(&t1, NULL, threadFunc, NULL);
    if(ret){
        printf("Thread creation error
");
        return -1;
    }
    ret = pthread_join(t1, &s);
    if(ret){
        printf("Thread join error
");
        return -1;
    }
    return 0;
}

2.2. Sample Output

2.2.1 Before malloc in the main thread

At this point no heap segment or thread stacks exist because thread1 has not been created.

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ ./mthread 
Welcome to per thread arena example::6501
Before malloc in main thread
... 
cat /proc/6501/maps
08048000-08049000 r-xp 00000000 08:01 539625     /home/sploitfun/ptmalloc.ppt/mthread/mthread
...

2.2.2 After malloc in the main thread

The heap segment appears just above the data segment, showing that the heap was created by moving the program break ( brk). Although only 1000 bytes were requested, a 132 KB arena was allocated.

sploitfun@sploitfun-VirtualBox:~/ptmalloc.ppt/mthread$ ./mthread 
... 
0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
...

2.2.3 After free in the main thread

Freed memory is returned to the allocator’s bins rather than to the kernel.

... 
0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
...

2.2.4 Before malloc in thread1

Thread1's heap does not exist yet, but its stack has been created.

... 
0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
...b7605000-b7606000 rw-p 00000000 00:00 0          [stack:6594]
...

2.2.5 After malloc in thread1

Thread1's heap (a "thread arena") is created via mmap, mapping about 1 MB of address space, of which 132 KB is usable.

... 
0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
 b7500000-b7521000 rw-p 00000000 00:00 0 
...

2.2.6 After free in thread1

Freed memory is added to the thread arena’s bin rather than returned to the OS.

... 
 b7500000-b7521000 rw-p 00000000 00:00 0 
...

3. Arena

3.1 Number of Arenas

On 32‑bit systems, the number of arenas equals 2 × CPU cores; on 64‑bit systems, it equals 8 × CPU cores.

For 32 bit systems:
Number of arena = 2 * number of cores.
For 64 bit systems:
Number of arena = 8 * number of cores.

3.2 Multiple Arenas

When the number of threads exceeds the arena limit, threads share existing arenas. The allocator locks an arena, uses it if available, or blocks until one becomes free.

3.3 Multiple Heaps

Each thread arena can maintain multiple heaps. When a heap is exhausted, a new heap is obtained via mmap. The main arena expands via sbrk until it meets the memory‑mapping segment.

Note : The main arena does not need heap_info because it expands with sbrk and its arena header resides in libc’s data segment.

Illustrations of main arena and thread arena structures are shown in the following images:

4. Chunk

Chunks in the heap can be allocated, free, top, or last‑remainder chunks.

4.1 Allocated Chunk

An allocated chunk contains user data. Its layout includes the chunk start address, the user‑accessible memory address ( mem = chunk + sizeof(malloc_chunk)), and the next chunk address.

4.2 Free Chunk

A free chunk stores prev_size, size, fd (forward pointer), and bk (backward pointer) to link within bins.

5. Bins

Bins are free‑list data structures that store free chunks. Types include fast bin, unsorted bin, small bin, and large bin.

5.1 Fast Bin

Fast bins hold chunks of 16‑80 bytes (effective range 16‑64 bytes) and use a LIFO singly‑linked list for the quickest allocation and deallocation.

5.2 Unsorted Bin

Freed chunks are first placed in the unsorted bin, allowing rapid reuse before they are sorted into appropriate bins.

5.3 Small Bin

Small bins manage chunks smaller than 512 bytes, using a FIFO doubly‑linked list. They are initialized on first use.

5.4 Large Bin

Large bins handle chunks of 512 bytes or more, with 63 bins covering a wide size range and using a doubly‑linked list for flexible insertion and removal.

5.5 Top Chunk

The top chunk is the highest chunk in an arena, not belonging to any bin. It is used when no suitable free chunk exists; it can be split or expanded via sbrk (main arena) or mmap (thread arena).

5.6 Last Remainder Chunk

The last remainder chunk is the leftover part after a small allocation split; it improves locality by staying near subsequent allocations.

Reference: Liu Xiang, Tong Wei, Liu Jingning, Feng Dan, Chen Jinlong. "Dynamic Memory Allocator Research Survey". Journal of Computer Science, 2018, 41(10): 2359‑2378.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

linuxmultithreadingmallocglibcMemory AllocatorHeap Memory
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.