Unlock iOS App Size Savings: Deep Dive into Mach‑O File Structure & Resource Optimization
This article explains how Baidu's iOS app analyzes Mach‑O binaries and applies systematic resource‑optimization techniques—including big‑resource detection, unused config removal, and duplicate asset elimination—to shrink package size by over a dozen megabytes.
Mach‑O File Overview
Mach‑O (Mach Object) is the executable, library and core‑dump format used on macOS and iOS. It consists of three logical parts: Header , Load Commands and Data , followed by loader‑info that stores string and symbol tables.
Inspection Tools
MachOView – a GUI viewer (download: http://sourceforge.net/projects/machoview/, source: https://github.com/gdbinit/MachOView)
otool – the built‑in command‑line utility. Common commands: otool -f – view FAT headers otool -a – view archive header otool -h – view Mach‑O header otool -l – list load commands otool -L – list dependent dynamic libraries otool -t -v – view text section otool -d – view data section otool -o – view Objective‑C segment otool -I – view symbol table otool -v -s __TEXT __cstring – extract all static strings otool -v -s __TEXT __objc_methname – extract Objective‑C method names
File type can be verified with the macOS file command and supported architectures listed with lipo -info:
~ % file demo
/demo: Mach-O 64-bit executable arm64
~ % lipo -info demo
Non-fat file: demo is architecture: arm64Header Structure
The 64‑bit header is defined as:
struct mach_header_64 {
uint32_t magic;
cpu_type_t cputype;
cpu_subtype_t cpusubtype;
uint32_t filetype;
uint32_t ncmds;
uint32_t sizeofcmds;
uint32_t flags;
uint32_t reserved;
};Key fields:
magic – identifies the file (e.g., 0xfeedfacf for arm64).
cputype – CPU architecture (ARM64, x86_64, …).
filetype – executable, library, core dump, etc.
ncmds – number of load commands.
sizeofcmds – total size of load commands.
flags – dyld loading flags such as MH_NOUNDEFS, MH_PIE.
View header values with otool -hv demo:
demo:
Mach header
magic cputype cpusubtype caps filetype ncmds sizeofcmds flags
MH_MAGIC_64 ARM64 ALL 0x00 EXECUTE 22 3040 NOUNDEFS DYLDLINK TWOLEVEL PIELoad Commands
Each load command has the generic form:
struct load_command {
uint32_t cmd; /* command type */
uint32_t cmdsize; /* size of command in bytes */
};Common command constants: LC_SEGMENT / LC_SEGMENT_64 – map a file segment into memory. LC_DYLD_INFO_ONLY – dynamic‑linking information. LC_SYMTAB – symbol table. LC_DYSYMTAB – dynamic symbol table. LC_LOAD_DYLIB – load a dynamic library. LC_UUID – unique identifier for crash symbolication. LC_VERSION_MIN_IPHONEOS – minimum iOS version. LC_MAIN – entry point of the main thread. LC_ENCRYPTION_INFO_64 – encryption information.
Segments and Sections
The four primary segments are __PAGEZERO, __TEXT, __DATA and __LINKEDIT. Each segment contains one or more sections.
__PAGEZERO – a guard page that catches NULL pointer dereferences; occupies no file space.
__TEXT – code segment (read‑only + executable). Important sections include __text (machine code), __cstring (C strings), __objc_methname (Objective‑C method names), etc.
__DATA – data segment (read‑write). Holds initialized data, BSS, Objective‑C class lists, symbol pointers, etc.
__LINKEDIT – contains linking information such as symbol tables and string tables.
Resource Optimization in Baidu App
Overview
Baidu App is a large‑scale iOS application that combines Hybrid, mini‑program, React Native, KMM and other frameworks. Resources larger than 40 KB account for roughly 26 MB of the bundle, providing a clear target for size reduction. Optimization is divided into three categories:
Big‑resource handling.
Removal of unused configuration files.
Duplicate‑resource detection.
Big‑Resource Detection
Recursively scan the .ipa bundle and list files whose size exceeds a configurable threshold (default 40 KB). Example Python script:
def findBigResources(path, threshold):
for entry in os.listdir(path):
child = os.path.join(path, entry)
if os.path.isfile(child):
ext = os.path.splitext(child)[-1]
if ext not in {".dylib", ".car"}:
size_kb = os.path.getsize(child) / 1024
if size_kb > threshold:
print(f"{child} length is {size_kb:.2f} KB")
else:
findBigResources(child + "/", threshold)Mitigation strategies:
Asynchronous download for resources not required at first launch or with low usage frequency.
Compress frequently used large resources and decompress them at runtime.
Unused Configuration Files
Collect configuration files (e.g., .plist, .json, .txt, .xib) while excluding image, JavaScript, CSS and binary assets. Sample script:
def findProfileResources(path):
for entry in os.listdir(path):
child = os.path.join(path, entry)
if os.path.isfile(child):
ext = os.path.splitext(child)[-1]
if ext not in {".dylib", ".car", ".png", ".webp", ".gif", ".js", ".css"}:
print(f"{child} suffix {ext}")
else:
findProfileResources(child + "/")Static strings embedded in the Mach‑O binary (section __TEXT __cstring) are extracted with otool and compared against the collected list to identify unused files:
lines = os.popen(f"/usr/bin/otool -v -s __TEXT __cstring {binary_path}").readlines()After manual verification, the identified unused configuration files can be removed.
Duplicate‑Resource Detection
All resource files are hashed with MD5; identical hashes indicate duplicates. Example implementation:
def get_file_library(path, file_dict):
for entry in os.listdir(path):
child = os.path.join(path, entry)
if os.path.isfile(child):
md5 = img_to_md5(child)
file_dict.setdefault(md5, []).append(entry)
else:
get_file_library(child, file_dict)
def img_to_md5(path):
with open(path, "rb") as f:
return hashlib.md5(f.read()).hexdigest()Duplicate files are either consolidated or removed, further shrinking the package.
Conclusion
Resource optimization delivered the largest size reduction for Baidu App, saving approximately 12 MB after two quarters. The workflow—systematic Mach‑O analysis combined with big‑resource, unused‑config and duplicate‑resource pipelines—eliminates existing waste and establishes a repeatable detection process for future incremental changes. The article also provides a concise technical reference for Mach‑O file format, load commands, segments and sections, which is useful for any iOS developer performing binary analysis or size‑optimization tasks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
