Fundamentals 32 min read

Analysis and Comparison of ptmalloc and tcmalloc Memory Allocators on Linux

This article examines Linux memory management fundamentals, explains the brk/sbrk and mmap system calls, details the internal architectures, allocation and reclamation processes of ptmalloc and tcmalloc, analyzes key configuration parameters, and presents benchmark and production‑level results that demonstrate their impact on performance and memory usage.

Zhuanzhuan Tech
Zhuanzhuan Tech
Zhuanzhuan Tech
Analysis and Comparison of ptmalloc and tcmalloc Memory Allocators on Linux

Memory management operates at three layers—user program, C runtime library, and kernel—and the allocator in the C runtime is crucial for handling allocation requests efficiently, pre‑allocating larger regions, and managing freed blocks without immediately returning them to the OS.

The article first introduces the #include int brk(void *addr); and void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset); system calls, describing how brk() adjusts the heap top, while mmap() creates anonymous memory mappings that are used for large allocations.

It then provides an in‑depth analysis of ptmalloc , covering its overall architecture with main and non‑main arenas, the chunk data structure, the organization of fast, unsorted, small, and large bins, and the allocation flow from arena lock acquisition through fast‑bin, unsorted‑bin, small‑bin, large‑bin, and top‑chunk handling. The reclamation process, parameter tuning (e.g., MALLOC_ARENA_MAX ), and characteristic drawbacks such as increased fragmentation, lock overhead, and per‑arena memory isolation are also discussed.

The next section examines tcmalloc , outlining its three‑tier architecture (front‑end cache, middle‑end with CentralFreeList and TransferCache, and back‑end page‑heap). It explains how spans and pagemap manage pages, how the front‑end provides per‑thread or per‑CPU caches, and how the middle‑end transfers memory between caches. Parameter analysis includes TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES , TCMALLOC_LARGE_ALLOC_REPORT_THRESHOLD , and others, followed by a feature analysis highlighting high performance, intelligent memory reclamation, reduced overhead, and low‑cost statistics.

Practical experiments are described: the environment setup (installing jemalloc and tcmalloc, configuring LD_PRELOAD and MALLOC_CONF ), benchmark methodology, and results. Benchmarks show that replacing the default ptmalloc with tcmalloc or jemalloc reduces post‑test memory growth from ~2.5 GB to ~0.5 GB, with tcmalloc offering slightly better latency due to thread‑local caches. Production observations confirm lower steady‑state memory usage and modest memory release when using tcmalloc.

The conclusion summarizes that selecting an appropriate allocator and tuning its parameters can significantly improve both memory consumption and latency for services such as search ranking, emphasizing the importance of allocator choice alongside JVM tuning.

References to various technical articles and design documents are listed at the end of the original source.

PerformanceLinuxBenchmarkmallocMemory AllocationTCMallocptmalloc
Zhuanzhuan Tech
Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.