In‑Depth Analysis of dlmalloc, jemalloc, Scudo, and PartitionAlloc for Virtual‑Machine Memory Management
This article examines the design goals, key implementation details, strengths and weaknesses of four widely used memory allocators—dlmalloc, jemalloc, Scudo, and PartitionAlloc—highlighting how they address fragmentation, performance, and security in virtual‑machine runtimes and offering guidance for building efficient, safe allocators.
Memory is a core resource in modern computer architectures, and a virtual machine (VM) must manage it efficiently through allocation, garbage collection, and reclamation. This article explores the design philosophy behind common memory allocators and provides guidance for implementing high‑performance VM memory management.
1. dlmalloc
Key Points
Obtains large memory segments from the OS via sbrk or mmap , linking them in a list of Segments.
Divides each Segment into chunks; small chunks are managed by small bins (simple doubly‑linked lists) and larger chunks by tree bins (size‑based tries).
Allocation strategy varies for small (<0x100 bytes), large, and huge (>64 KB) requests.
Weaknesses
Not thread‑safe; Android’s Bionic has replaced it with more modern heap implementations.
Buddy allocation reduces external fragmentation but incurs heavy internal fragmentation.
2. jemalloc (Android 5.0+)
Designed to improve multithreaded performance and reduce fragmentation.
Key Points
Allocates a large Chunk (typically 512 KB) via mmap , split into a header and data region.
Data region is divided into Run s, each managed by a bin (bucket) for a specific size class.
Each thread has a tcache (thread‑local free list) to avoid contention; arenas are locked only when necessary.
Three allocation paths: Small Object (from Run), Large Object (single Run), Huge Object (>4 MB, direct mmap ).
Summary
jemalloc’s core is a bin allocator; each arena has logical bins for size classes, minimizing fragmentation.
Per‑arena locks prevent lock contention across arenas.
Thread‑specific caches further reduce data races.
3. Scudo (Android 11+)
Scudo (named after “escudo”, meaning shield) is the default allocator for native code on Android 11, focusing on security.
Key Points
Scudo defines Primary and Secondary allocators. Requests < 64 KB use the Primary allocator; larger requests use the Secondary allocator.
struct AndroidConfig {
using SizeClassMap = AndroidSizeClassMap;
#if SCUDO_CAN_USE_PRIMARY64
// 256 MB regions
typedef SizeClassAllocator64
Primary;
#else
// 256 KB regions
typedef SizeClassAllocator32
Primary;
#endif
typedef MapAllocator
> > Secondary;
template
using TSDRegistryT = TSDRegistrySharedT
; // max 2 TSDs
};The Primary allocator reserves a virtual address space, splits it into size classes, and uses random offsets to mitigate address‑space attacks. Each allocated chunk carries a header with class ID, state, origin, size, and a checksum for integrity verification.
NOINLINE void* allocate(uptr Size, Chunk::Origin Origin,
uptr Alignment = MinAlignment,
bool ZeroContents = false) {
initThreadMaybe();
// ... fast path using thread‑local cache ...
if (PrimaryT::canAllocate(NeededSize)) {
// allocate from Primary
} else {
// fallback to Secondary
}
// Align user pointer, set chunk header, compute checksum
return TaggedPtr;
}Scudo also employs a Thread‑Local Cache (TSD) to accelerate multithreaded allocations, and a secondary allocator that adds guard pages around large blocks.
Summary
Scudo improves security through randomization, guard pages, and checksum‑based integrity checks, at the cost of some memory overhead and reduced allocation speed compared to jemalloc.
4. PartitionAlloc
Chromium’s cross‑platform allocator, optimized for client‑side workloads and security.
Key Points
Memory is divided into Super Page s (2 MB) which are further split into Slot Span s and Partition Page s.
Each bucket manages a collection of slot spans; small allocations are served from per‑thread caches, larger ones from a shared free‑list, and huge allocations via direct mapping.
Guard pages protect the first and last partition pages; metadata is stored separately from objects.
Four main partitions in Chromium: Buffer, Node, LayoutObject, and FastMalloc, each with tailored allocation strategies.
Security Features
Linear overflows are caught by guard pages.
Metadata resides in dedicated regions, preventing corruption of object data.
Large allocations have guard pages at both ends.
Buckets isolate size classes, reducing cross‑size attacks.
Summary
PartitionAlloc offers a small code footprint, high allocation efficiency, and strong security guarantees through address‑space isolation and guard pages, making it suitable for performance‑critical, security‑sensitive applications like Chromium.
5. Overview
Allocator performance and space efficiency depend on trade‑offs among caching, allocation strategies, and security mechanisms. The table below summarizes the main characteristics of each allocator.
Allocator
Notes
dlmalloc
Non‑thread‑safe, high fragmentation, low performance, low security.
jemalloc
Large code size, high memory usage, good performance, moderate security.
scudo
Security‑focused, moderate performance, low memory efficiency due to metadata and randomization.
partition‑alloc
Small code size, high performance, strong security via isolation and guard pages, cross‑platform.
Benchmark results show varying QPS and RSS footprints across allocators, illustrating the trade‑offs between speed and memory consumption.
ByteDance Web Infra
ByteDance Web Infra team, focused on delivering excellent technical solutions, building an open tech ecosystem, and advancing front-end technology within the company and the industry | The best way to predict the future is to create it
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.