Fundamentals 33 min read

Douyin’s Deep Dive: Expanding Android ART Heap, FD Limits & M:N Threading on Legacy Devices

This article details how Douyin engineers tackled Android’s limited heap, file‑descriptor, and thread constraints on older phones by expanding ART malloc and region spaces, enlarging FD/FD_SET limits, and implementing a transparent M:N user‑level threading model, achieving significant stability and performance gains.

ByteDance SE Lab
ByteDance SE Lab
ByteDance SE Lab
Douyin’s Deep Dive: Expanding Android ART Heap, FD Limits & M:N Threading on Legacy Devices

Background

As Android apps evolve into “super‑apps”, older devices face severe constraints: ART heap size is often only 256 MB even with largeHeap, Android 9 and below limit a process to 1024 file descriptors, and many OEMs cap the total number of threads+processes at 500. These limits cause high OOM rates, crashes, and poor user experience.

1. Expanding ART malloc space (Android 5‑7)

1.1 Basics

ART heap consists of several spaces; the main allocation occurs in the malloc space . Different Android versions use different space types:

Android 5‑7: cms + copy gc → malloc space

Android 8‑14: cc → region space

Android 15+: cmc → bump pointer space

Douyin focused on expanding the malloc space on Android 5‑7 because it is the most common on legacy devices.

1.2 Technical solution

Restrict copy‑GC so the VM works on a single space.

Release the unused backup space and allocate a larger one.

Trigger copy‑GC to switch the main space to the new backup.

Repeat steps 1‑3 for the second space.

Modify the heap’s capacity limit.

1.2.1 Locking a space

When native code holds a Java object pointer (e.g., via GetPrimitiveArrayCritical), moving GCs must be disabled to keep the address valid. This provides a natural point to lock the current space.

art::Heap::PerformHomogeneousSpaceCompact</code><code>art::Heap::CollectGarbageInternal

1.2.2 Finding expansion memory

ART uses 32‑bit compressed pointers, limiting the addressable heap to the low 4 GB of the process. Expansion must satisfy address range, card‑table mapping, contiguity, equal size for both spaces, and page‑size alignment.

1.2.3 Creating a new space

Using MapAnonymous and CreateMallocSpaceFromMemMap (found via dlsym in libart.so), a larger malloc space is created at the chosen address.

1.2.4 Replacing heap references

Heap pointers are located by scanning the runtime’s memory layout (double‑loop search) or by hooking art::Heap::ClearGrowthLimit to capture the this pointer.

1.2.5 Triggering space switch

Copy‑GC is forced via PerformHomogeneousSpaceCompact(), which swaps the main and backup spaces and updates limits.

1.3 Results

On Android 5‑6 devices the heap grew to 740‑750 MB; on Android 7 to 960‑980 MB, reducing OOM rates by 60.77 %.

2. Expanding ART region space (Android 8‑9)

2.1 Basics

Android 8+ introduced concurrent copying (CC) with region space , dividing the heap into equal 256 KB regions (free, allocated, from‑space, large‑object). CC copies live objects from from‑space to to‑space.

2.2 Technical solution

Find free contiguous memory after the existing region space within the low‑4 GB area.

Block GC/heap‑trim calls during expansion.

Inline‑hook Heap::StartGC() to create a safe window.

Perform expansion steps: stop‑the‑world, enlarge the regions array, grow the live‑bitmap, update MemMap, re‑add the region space, resume.

Trigger Heap::FinishGC().

Update heap capacity limits.

Unblock GC/trim.

2.2.1 Searching expansion memory

Two gaps are identified in the low‑4 GB area: a 299 MB “backward” gap (0x00010000‑0x12c00000) and a 476 MB “forward” gap (0x52c00000‑0x7088d000). Only the forward gap is used for stability, yielding a final heap size of ~740 MB (+45 %).

2.2.2 Expanding the regions array

The regions_ array (type Region) must be resized. Each region is 0x50 bytes on Android 8‑9; the new array is allocated, initialized, and the old data copied via memcpy.

expand_regions_size = (region_space_size / 256KB) * 0x50

2.2.3 Expanding the live‑bitmap

A new 8‑byte bitmap is created using SpaceBitmap<8byte>::Create, then the original bitmap data is copied (aligned to 512 byte boundaries).

2.2.4 Updating MemMap addresses

Offsets for MemMap fields are hard‑coded after runtime disassembly; they are applied with region‑size alignment.

2.2.5 Re‑adding region space to the heap

After expansion, the old space is removed and the new one added via RemoveSpace and AddSpace symbols in libart.so.

2.2.6 Restoring state

Resume the VM.

Trigger FinishGC.

Update heap capacity/growth limits.

Unblock GC/trim.

2.3 Key offset anchors

Important offsets are obtained from: RegionSpace::FromSpaceSize() – provides region size (0x50), num_regions_offset (0xb0), regions_offset (0xc0).

Heap constructor – gives MemMap offsets. DlMallocSpace::Clear – provides begin/size/base_begin/base_size offsets.

2.4 Results

FD‑related crashes dropped by 8.8 %, freezes by 4.8 %, OOM by 6.93 %, and post‑GC memory‑water‑mark >90 % reduced by 73.34 % on Android 8‑9 devices.

3. Expanding FD/FD_SET limits

3.1 Technical solution

Increase the kernel‑level FD limit via setrlimit(RLIMIT_NOFILE, …).

Override the libc fd_set size by hooking select / pselect (which ultimately call __pselect6) and providing larger buffers.

3.1.1 Expanding FD_SET in user space

All FD_SET, FD_CLR, FD_ISSET macros are redirected to checked versions that accept a size argument. Inline hooks create a peer expanded fd_set on the heap and map the original stack‑allocated set to it.

fd_set *get_expanded_fd_set(fd_set *origin_fd_set)</code><code>fd_set *add_expanded_fd_set(fd_set *origin_fd_set)</code><code>void release_expanded_fd_set(int fd)

3.2 Results

FD/FD_SET overflow issues on Android 9‑ and below were virtually eliminated, reducing crashes by 7.23 %.

4. M:N Transparent User‑Level Threading

4.1 Basics

On many Android 8‑ and below devices, OEMs limit an app to 500 threads+processes. Douyin implemented a transparent M:N scheduler that multiplexes many pthreads onto a smaller number of Linux lightweight processes (LWPs).

4.2 Technical solution

Intercept clone syscall to create a transparent proxy for thread creation.

Hook pthread_exit to prevent the underlying LWP from terminating.

Use a periodic POSIX timer that sends a real‑time signal to preempt the currently running thread.

In the signal handler, capture the full thread context (general registers, pstate, tpidr_el0, floating‑point/vector registers) via ucontext_t and a custom extra_context field.

Store each thread’s context in a scheduler queue; on each timer tick, save the current context and restore the next thread’s context.

Handle non‑restartable syscalls (e.g., select, poll, nanosleep, signal‑wait syscalls) by delegating them to a dedicated daemon VCPU thread or by implementing custom restart logic.

4.3 Effects

The prototype runs up to 15 Java threads and 3 native pthreads on a single LWP, effectively bypassing the 500‑thread cap. Although timer‑based preemption adds overhead compared to native threads, it dramatically improves stability for legacy devices.

ART heap memory layout diagram
ART heap memory layout diagram
Region space memory layout
Region space memory layout
Locking a space diagram
Locking a space diagram
VMRuntime clearGrowthLimit hook
VMRuntime clearGrowthLimit hook
Region space bitmap expansion
Region space bitmap expansion
Region space added to heap
Region space added to heap
Transparent pthread proxy
Transparent pthread proxy
Timer‑based preemption
Timer‑based preemption
AndroidARTthreadingFD Limits
ByteDance SE Lab
Written by

ByteDance SE Lab

Official account of ByteDance SE Lab, sharing research and practical experience in software engineering. Our lab unites researchers and engineers from various domains to accelerate the fusion of software engineering and AI, driving technological progress in every phase of software development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.