Mobile Development 20 min read

Unlock iOS App Size Savings: Deep Dive into Mach‑O File Structure & Resource Optimization

This article explains how Baidu's iOS app analyzes Mach‑O binaries and applies systematic resource‑optimization techniques—including big‑resource detection, unused config removal, and duplicate asset elimination—to shrink package size by over a dozen megabytes.

Baidu App Technology
Baidu App Technology
Baidu App Technology
Unlock iOS App Size Savings: Deep Dive into Mach‑O File Structure & Resource Optimization

Mach‑O File Overview

Mach‑O (Mach Object) is the executable, library and core‑dump format used on macOS and iOS. It consists of three logical parts: Header , Load Commands and Data , followed by loader‑info that stores string and symbol tables.

Inspection Tools

MachOView – a GUI viewer (download: http://sourceforge.net/projects/machoview/, source: https://github.com/gdbinit/MachOView)

otool – the built‑in command‑line utility. Common commands: otool -f – view FAT headers otool -a – view archive header otool -h – view Mach‑O header otool -l – list load commands otool -L – list dependent dynamic libraries otool -t -v – view text section otool -d – view data section otool -o – view Objective‑C segment otool -I – view symbol table otool -v -s __TEXT __cstring – extract all static strings otool -v -s __TEXT __objc_methname – extract Objective‑C method names

File type can be verified with the macOS file command and supported architectures listed with lipo -info:

~ % file demo
/demo: Mach-O 64-bit executable arm64
~ % lipo -info demo
Non-fat file: demo is architecture: arm64

Header Structure

The 64‑bit header is defined as:

struct mach_header_64 {
    uint32_t magic;
    cpu_type_t cputype;
    cpu_subtype_t cpusubtype;
    uint32_t filetype;
    uint32_t ncmds;
    uint32_t sizeofcmds;
    uint32_t flags;
    uint32_t reserved;
};

Key fields:

magic – identifies the file (e.g., 0xfeedfacf for arm64).

cputype – CPU architecture (ARM64, x86_64, …).

filetype – executable, library, core dump, etc.

ncmds – number of load commands.

sizeofcmds – total size of load commands.

flags – dyld loading flags such as MH_NOUNDEFS, MH_PIE.

View header values with otool -hv demo:

demo:
Mach header
      magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
MH_MAGIC_64    ARM64   ALL  0x00 EXECUTE 22 3040   NOUNDEFS DYLDLINK TWOLEVEL PIE

Load Commands

Each load command has the generic form:

struct load_command {
    uint32_t cmd;      /* command type */
    uint32_t cmdsize;  /* size of command in bytes */
};

Common command constants: LC_SEGMENT / LC_SEGMENT_64 – map a file segment into memory. LC_DYLD_INFO_ONLY – dynamic‑linking information. LC_SYMTAB – symbol table. LC_DYSYMTAB – dynamic symbol table. LC_LOAD_DYLIB – load a dynamic library. LC_UUID – unique identifier for crash symbolication. LC_VERSION_MIN_IPHONEOS – minimum iOS version. LC_MAIN – entry point of the main thread. LC_ENCRYPTION_INFO_64 – encryption information.

Segments and Sections

The four primary segments are __PAGEZERO, __TEXT, __DATA and __LINKEDIT. Each segment contains one or more sections.

__PAGEZERO – a guard page that catches NULL pointer dereferences; occupies no file space.

__TEXT – code segment (read‑only + executable). Important sections include __text (machine code), __cstring (C strings), __objc_methname (Objective‑C method names), etc.

__DATA – data segment (read‑write). Holds initialized data, BSS, Objective‑C class lists, symbol pointers, etc.

__LINKEDIT – contains linking information such as symbol tables and string tables.

Resource Optimization in Baidu App

Overview

Baidu App is a large‑scale iOS application that combines Hybrid, mini‑program, React Native, KMM and other frameworks. Resources larger than 40 KB account for roughly 26 MB of the bundle, providing a clear target for size reduction. Optimization is divided into three categories:

Big‑resource handling.

Removal of unused configuration files.

Duplicate‑resource detection.

Big‑Resource Detection

Recursively scan the .ipa bundle and list files whose size exceeds a configurable threshold (default 40 KB). Example Python script:

def findBigResources(path, threshold):
    for entry in os.listdir(path):
        child = os.path.join(path, entry)
        if os.path.isfile(child):
            ext = os.path.splitext(child)[-1]
            if ext not in {".dylib", ".car"}:
                size_kb = os.path.getsize(child) / 1024
                if size_kb > threshold:
                    print(f"{child} length is {size_kb:.2f} KB")
        else:
            findBigResources(child + "/", threshold)

Mitigation strategies:

Asynchronous download for resources not required at first launch or with low usage frequency.

Compress frequently used large resources and decompress them at runtime.

Unused Configuration Files

Collect configuration files (e.g., .plist, .json, .txt, .xib) while excluding image, JavaScript, CSS and binary assets. Sample script:

def findProfileResources(path):
    for entry in os.listdir(path):
        child = os.path.join(path, entry)
        if os.path.isfile(child):
            ext = os.path.splitext(child)[-1]
            if ext not in {".dylib", ".car", ".png", ".webp", ".gif", ".js", ".css"}:
                print(f"{child} suffix {ext}")
        else:
            findProfileResources(child + "/")

Static strings embedded in the Mach‑O binary (section __TEXT __cstring) are extracted with otool and compared against the collected list to identify unused files:

lines = os.popen(f"/usr/bin/otool -v -s __TEXT __cstring {binary_path}").readlines()

After manual verification, the identified unused configuration files can be removed.

Duplicate‑Resource Detection

All resource files are hashed with MD5; identical hashes indicate duplicates. Example implementation:

def get_file_library(path, file_dict):
    for entry in os.listdir(path):
        child = os.path.join(path, entry)
        if os.path.isfile(child):
            md5 = img_to_md5(child)
            file_dict.setdefault(md5, []).append(entry)
        else:
            get_file_library(child, file_dict)

def img_to_md5(path):
    with open(path, "rb") as f:
        return hashlib.md5(f.read()).hexdigest()

Duplicate files are either consolidated or removed, further shrinking the package.

Conclusion

Resource optimization delivered the largest size reduction for Baidu App, saving approximately 12 MB after two quarters. The workflow—systematic Mach‑O analysis combined with big‑resource, unused‑config and duplicate‑resource pipelines—eliminates existing waste and establishes a repeatable detection process for future incremental changes. The article also provides a concise technical reference for Mach‑O file format, load commands, segments and sections, which is useful for any iOS developer performing binary analysis or size‑optimization tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mobile developmentiOSMach-Opackage optimizationResource Optimization
Baidu App Technology
Written by

Baidu App Technology

Official Baidu App Tech Account

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.