How Baidu Cut iOS App Size by Removing Unused Methods with LLVM Libtooling

Baidu reduced the iOS app package by over 350 MB by discarding dead code, replacing unreliable Mach‑O analysis with a source‑level AST approach built on LLVM libtooling and the Swift compiler, and implementing a multi‑layer static‑analysis pipeline that extracts, transforms, stores, and filters method usage data.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
How Baidu Cut iOS App Size by Removing Unused Methods with LLVM Libtooling

Background

After an initial round of resource cleanup and Xcode build optimizations, Baidu's flagship iOS app still occupied about 350 MB on an iPhone 11. To further shrink the binary, the team needed a reliable way to identify and remove dead Objective‑C/Swift methods.

Limitations of Existing Mach‑O/LinkMap Analyses

Low accuracy and heavy manual filtering of system symbols.

Inability to detect load / initialize calls, attribute‑based invocations, or string‑based reflection (e.g., target‑action, observers).

Failure on complex inheritance chains where a subclass overrides a superclass method.

Because these approaches operate on the compiled binary, they cannot see the full set of declarations needed for precise dead‑code detection.

Chosen Solution: Source‑Level AST Analysis

The team evaluated three options:

Write a full language parser – rejected due to the effort required to support Objective‑C, C, C++, and Swift.

Use clang’s command‑line AST dump – rejected because it produces per‑file ASTs without cross‑file relationships (inheritance, categories, etc.).

Build a custom compilation suite with libtooling for clang and the Swift compiler – adopted.

This approach captures the complete abstract syntax tree (AST) during compilation, enabling precise identification of declared but never invoked methods.

Compilation Flow Overview

Xcode invokes two front‑ends:

clang for Objective‑C, C, and C++.

swift‑frontend for Swift.

Both front‑ends feed a common LLVM backend. The custom tool mirrors this flow, extracting the AST from each front‑end.

Tool Architecture

Basic Layer : assembles compiler arguments and matches language syntax.

Transformer Layer : converts matched data into a unified format.

Common Data Layer : stores categorized information (classes, methods, properties, etc.) as JSON.

Business Layer : applies domain‑specific analysis (dead‑method detection, interface audits, etc.).

Key steps include parsing parameters, creating a ClangTool (see LLVM source Tooling.h:309), defining an ASTFrontendAction, binding ASTMatcher rules, filtering matches, and executing business logic.

Building the LLVM‑Based Toolchain

The LLVM source is cloned from https://github.com/llvm/llvm-project. A Release build can be produced with Ninja or Xcode:

git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build && cd build
cmake -G "Ninja" -DCMAKE_BUILD_TYPE=Release ../llvm
cmake --build .

Both libclang and libtooling were considered; libtooling was selected because it provides full AST access and can run independently of Xcode.

Data Model

Method information is stored as JSON objects, for example:

{
  "identifier": "objc@ClassName@methodName",
  "isInstance": true,
  "kind": 16,
  "location": {"filename": "File.m", "line": 147, "col": 36},
  "name": "methodName",
  "parameters": "(int a, NSString *b)",
  "returnType": "void",
  "sourceCode": "..."
}

Challenges and Mitigations

Property getters/setters : treat a property as used if any accessor is called; filter out the unused accessor.

Method implementations in header files : scan both .m and .h files because Objective‑C can define inline methods in headers.

Inheritance chains : resolve calls to superclass methods and map them to the original declaration.

System method filtering : use LLVM APIs to detect and discard methods belonging to system frameworks.

Protocol methods : currently marked as used; future work will improve detection.

Implementation Details

During a normal Xcode build the following executables are used:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swift-frontend

These can be invoked directly to reproduce the compilation steps. The custom tool creates a ClangTool instance, registers an ASTFrontendAction, and attaches ASTMatcher rules that capture declarations (functions, methods, properties) and their call sites. After matching, the tool serializes the collected data to JSON and runs business‑logic filters (e.g., dead‑method detection).

Choosing libtooling over libclang

libclang : stable C API, but cannot access the full AST (e.g., missing inheritance information).

libtooling : C++ API that provides complete AST access, works as an independent command‑line tool, but requires rebuilding when the underlying clang version changes.

The project uses libtooling to obtain all necessary AST details.

Data Storage Design

The extracted information is stored in JSON. A minimal schema is shown above; additional fields can be added for protocols, categories, or Swift symbols as needed.

Edge‑Case Handling

Property access : if only a getter or setter is invoked, the other accessor is considered dead.

Header‑only implementations : both implementation and declaration files are scanned.

Inheritance : method calls are back‑tracked through the class hierarchy; identifiers are normalized to the declaring superclass.

System APIs : LLVM provides utilities to recognize symbols from Apple frameworks and filter them out.

Protocol methods : currently assumed used; future work will enumerate protocol requirements and compare against actual implementations.

Result

The static‑analysis pipeline successfully identified thousands of dead methods, allowing Baidu to reduce the iOS binary by tens of megabytes without affecting functionality. The same infrastructure also supports interface‑change audits, component integrity checks, and privacy‑compliance call‑chain analysis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

iOSASTbuild toolsLLVMstatic analysisapp size optimizationlibtooling
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.