Mobile Development 25 min read

How We Halved Thread Stack Size and Freed 130 MB on 32‑bit Android Devices

This article details three memory‑saving techniques—reducing default thread stack size, releasing a 130 MB WebView reservation, and shrinking the ART heap—implemented via system‑API hooking and the Patrons library, and evaluates their performance impact on 32‑bit Android devices.

WeChat Client Technology Team
WeChat Client Technology Team
WeChat Client Technology Team
How We Halved Thread Stack Size and Freed 130 MB on 32‑bit Android Devices

Background

We need to mitigate virtual‑memory exhaustion on 32‑bit Android devices as WeChat’s new features increase memory pressure. Conventional optimizations help at first but quickly lose effectiveness, so we explored faster, more aggressive solutions referred to as “black‑tech”.

Research and Ideas

By examining /proc/self/maps we identified three main memory‑consuming regions:

Kernel reserved area – not modifiable.

System pre‑allocation area – contains the WebView reservation and other framework code.

App‑owned area – includes Dex, resources, native libraries, thread stacks, shared memory, etc.

From these we derived three techniques:

Thread default stack space halved.

Release the WebView pre‑allocation (≈130 MB).

Shrink the virtual‑machine heap.

Implementation

Intercepting System APIs

We use global API interception as the foundation. On Android the two main native hooking methods are GOT/PLT Hook and Inline Hook.

GOT/PLT Hook – modify a single 32‑bit GOT entry; low risk but requires handling every caller.

Inline Hook – patch the first few instructions of the target function; works for PLT/GOT and dlsym calls but is more complex and riskier.

We combine PLT/GOT Hook with a “export‑table” Hook (implemented via the open‑source xhook library) to achieve reliable interception.

Thread Stack Halving

After intercepting pthread_create, we examine the attr argument. If it is NULL, we create a pthread_attr_t with stacksize set to half the default (≈512 KB). If a custom stack size is already set, we halve it only when it equals the default. Threads that require a larger stack are whitelisted based on the native library path that created them.

int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
                  void *(*start_routine)(void *), void *arg);

Releasing WebView Reservation

The reservation appears in maps as [anon:libwebview reservation]. We locate the region, then call munmap after intercepting android_dlopen_ext to prevent WebView from loading into the freed space. The interception replaces the constant ANDROID_DLEXT_RESERVED_ADDRESS with ANDROID_DLEXT_RESERVED_ADDRESS_HINT and sets extinfo.reserved_size to zero, forcing the loader to choose a new address.

jboolean DoReserveAddressSpace(jlong size) { … prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, addr, vsize, "libwebview reservation"); … }

VM Heap Reduction

For Android 5.0‑7.1 the ART heap uses two semi‑space regions (From Space and To Space). By identifying the main space and main space 1 regions in maps, we can release one half when virtual memory is tight. Determining which side is From Space is done by allocating a temporary object and checking its address. To prevent subsequent Compact/Moving GC from copying into the released space, we invoke GetPrimitiveArrayCritical and deliberately omit the matching ReleasePrimitiveArrayCritical, keeping the thread blocked for the duration of the operation.

Patrons Library for RegionSpace

On Android 8.0+ the ART heap switches to RegionSpace. We use Alibaba’s open‑source Patrons library to call RegionSpace::ClampGrowthLimit. The library obtains the Runtime instance from libart.so, walks to the Heap and then to the RegionSpace, and finally invokes ClampGrowthLimit (or emulates it on older releases) to shrink the region by at least one page.

void RegionSpace::ClampGrowthLimit(size_t new_capacity) { … CHECK_LE(new_capacity, NonGrowthLimitCapacity()); … }

Performance Overhead

All measurements were taken on a Google Pixel 4 (Android R, 32‑bit process) with fixed CPU frequency.

Thread‑stack halving adds ~65 ms for initialization and ~332 µs per thread with default stack size.

WebView reservation release costs 5 ms (maps found) or 7 ms (fallback via Java reflection) plus ~2 µs for loading a blank WebView.

VM‑heap reduction requires ~1 ms to locate the target region.

Patrons library initialization adds ~8 ms.

Overall runtime performance remained unchanged in our tests.

One More Thing

Anonymous memory regions can be named via prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, addr, vsize, "name"). After intercepting mmap, we call the original function and then name the region using the caller’s library path, which greatly aids map analysis.

Anonymous memory naming example
Anonymous memory naming example

All the “black‑tech” implementations (except Patrons) have been merged into the Matrix suite ( https://github.com/Tencent/matrix ) to help developers address memory‑pressure issues on 32‑bit Android devices.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AndroidMemory OptimizationWebViewARTnative hookingPatronsThread Stack
WeChat Client Technology Team
Written by

WeChat Client Technology Team

Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.