How Android’s Linker Loads and Links Native .so Files (and What It Means for Packing)
This article explains Android's linker workflow for loading and linking native .so libraries, detailing the do_dlopen sequence, ELF parsing, memory mapping, soinfo allocation, relocation handling, and constructor calls, and concludes with a brief overview of common SO packing techniques.
1. Introduction
Android system security is increasingly important; similar to executable hardening on PCs, application hardening—especially native .so protection—is a crucial part of Android security. Native protection focuses on the SO files in the native layer, using techniques such as packing, anti‑debugging, obfuscation, and VM‑based tricks to raise reverse‑engineering difficulty. Understanding the linker and its loading/linking mechanism is essential for both security researchers and developers.
This article analyses the linker's loading and linking process for SO files and briefly introduces key packing technologies.
The discussion is limited to the handling of dlopen("libxx.so") on Android 5.0 AOSP source for the ARM platform; source snippets have been trimmed for readability.
P.S.: Readers should have a basic understanding of ELF file structure.
2. SO Loading and Linking
2.1 Overall Process
1. do_dlopen After dlopen is called, the flow passes through dlopen_ext and reaches the main function do_dlopen:
do_dlopencalls two important functions: find_library (which continues the loading/linking) and the soinfo member CallConstructors (which invokes the SO’s initialization functions).
2. find_library_internal find_library directly invokes find_library_internal:
find_library_internalfirst checks whether the target SO is already loaded via find_loaded_library_by_name. If not, it calls load_library to continue the loading process.
3. load_library
load_libraryimplements the whole SO loading and linking flow in three steps:
Loading : create an ElfReader object and call its Load method to map the SO into memory.
Allocating soinfo : invoke soinfo_alloc to allocate a new soinfo structure and fill it with the loading results.
Linking : call soinfo_link_image to complete the linking.
The remainder of this section examines the ElfReader class and soinfo_link_image in detail.
2.2 Loading
In load_library, the ElfReader is instantiated with the SO name and file descriptor: ElfReader elf_reader(name, fd) Then elf_reader.Load() is called.
The Load method reads the ELF header, validates it, reads the program header, calculates the required memory size, allocates space with mmap, and maps each PT_LOAD segment into memory.
2.2.1 Read & Verify ELF Header
ReadElfHeaderreads the ELF header into header_ (type Elf32_Ehdr) and validates magic bytes, class (32/64‑bit), endianness, file type, version, and target platform.
2.2.2 Read Program Header
The program header table is temporarily mapped for parsing and released after the SO is fully loaded.
2.2.3 Reserve Space & Compute Load Size
phdr_table_get_load_sizeiterates over PT_LOAD segments to find the minimum virtual address and the maximum end address, aligns them to page boundaries, and computes load_size. The loader then reserves this size with mmap.
About load_bias : If an SO specifies a non‑page‑aligned base address, the actual mapping address differs by load_bias . For ordinary SOs, min_vaddr = 0 and load_bias = load_start , which is treated as the base address.
2.2.4 Load Segments
For each PT_LOAD segment, the loader:
Computes seg_start and seg_end using load_bias, aligns them to page boundaries.
Computes the file‑page start and length.
Calls mmap with the calculated addresses and lengths.
2.3 Allocating soinfo
After loading, load_library calls soinfo_alloc to allocate a soinfo structure for the SO. The soinfo holds loading, symbol, relocation, and initialization information used later by the linker and at runtime.
Key fields used during loading/linking include phdr, phnum, base, size, symbol tables, relocation tables, and init/fini arrays.
2.4 Linking
The linking is performed by soinfo_link_image and consists of four steps:
Locate the dynamic section via phdr_table_get_dynamic_section.
Parse the dynamic section (array of Elf32_Dyn entries) to obtain symbol, relocation, and init/fini information.
Load any needed dependent SOs by calling find_library.
Perform relocation, the most complex part, by fixing imported symbol references.
2.4.1 Relocation
Android ARM processes two relocation tables: plt_rel (for PLT entries) and rel. Both are handled by soinfo_relocate, which iterates over each relocation entry, determines the type, symbol index, and target address, looks up imported symbols if needed, and patches the target address accordingly.
2.5 CallConstructors
When compiling an SO, the -init linker option or the __attribute__((constructor)) attribute can designate initialization functions. These functions are invoked after the SO is loaded and linked, before the dlopen call returns.
After do_dlopen obtains the soinfo of the newly loaded SO, it finally calls CallConstructors, which recursively invokes constructors of dependent SOs and then runs the SO’s own init functions and init_array entries.
3. Packing Techniques
In the malware and DRM fields, “packing” (or “shelling”) is used to compress and encrypt code, often combined with virtualization, obfuscation, and anti‑debugging to hinder static and dynamic analysis.
For Android native libraries, packing targets the SO file. The typical architecture consists of three components:
SO : the protected target library.
Loader : a small SO that is loaded first, restores the encrypted/compressed SO in memory, loads it, and performs linking so that the protected SO can be used.
Packing tool : creates the encrypted/compressed payload and merges it with the loader to produce a packed SO.
3.1 Loader Execution Timing
The loader must run before the protected SO is used. This can be achieved via the SO’s init/init_array functions or via JNI_OnLoad.
3.2 Loader Performs Loading and Linking
After restoring the SO in memory, the loader repeats the linker's loading and linking steps, with the main difference being that it reads from memory instead of a file descriptor.
3.2.1 Loading
The loader follows the same two‑step process as the linker for PT_LOAD segments, adjusting for the fact that the source data resides in memory.
3.2.2 Allocating soinfo
The loader can reuse the linker's soinfo structure to store intermediate information, then copy the relevant fields back to the linker's soinfo after loading.
3.2.3 Linking
Linking is identical to the linker's process; after linking, the loader must also invoke the SO’s init functions.
3.3 soinfo Repair
Because the system’s linker maintains a soinfo for the loader, the loader must patch this structure with the real SO’s information (base, size, load_bias, symbol tables, bucket/chains, ARM exception tables, etc.) so that subsequent dlsym lookups work correctly.
References
<<Linkers and loaders>> <<ELF for the ARM Architecture>>Tencent TDS Service
TDS Service offers client and web front‑end developers and operators an intelligent low‑code platform, cross‑platform development framework, universal release platform, runtime container engine, monitoring and analysis platform, and a security‑privacy compliance suite.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
