How to Build a Static Android APK Scanner for Sensitive API Detection
This article explains how to create a static analysis tool that scans Android APKs for privacy‑sensitive API calls, covering both compile‑time bytecode instrumentation and runtime stack tracing, with step‑by‑step instructions, code examples, configuration details, and optimization techniques.
Tool Overview
The author developed a static analysis tool for Android APKs that detects calls to privacy‑sensitive APIs (e.g., OAID, Android ID, location) both at build time and during runtime, aiming to improve compliance checks for Google Play and other app stores.
Runtime Detection
To capture when a target method is invoked, a Gradle transform plugin injects a logging statement that prints the call stack using android.util.Log. The injection is performed with Javassist, which modifies the bytecode of the identified method.
method.insertBefore("android.util.Log.e(\"Privacy method call stack\", android.util.Log.getStackTraceString(new Throwable()));")Product Scanning
The scanning process works on the final APK package. The workflow is:
Unzip the APK and extract the .dex files.
Convert the .dex files to a .jar using dex2jar.
Decompile the .jar to Java source files.
Line‑by‑line scan the Java files for predefined sensitive API keywords.
Two JSON configuration files drive the scan:
gp.json – rules for Google Play compliance.
privacy.json – rules for general privacy compliance.
Each file contains keys (the keywords to search) and filterPackages (package names to ignore, such as the app’s own code, to reduce noise).
Optimizations
Runtime detection originally could not monitor system APIs because they are not packaged; the solution is to instrument every class and JAR and look for system‑API calls using Javassist’s CtMethod and instrument capabilities.
Product scanning was slow (3–5 minutes) due to the jar → java conversion; switching to direct .dex → smali decompilation with baksmali.jar cuts the time to about 30 seconds.
Decompilation failures caused missing results; handling both Java and Kotlin bytecode mitigates this.
Previous line‑by‑line scans also captured irrelevant tokens such as field names and imports; the new scanner filters out non‑method tokens.
Smali‑Based Scanning
Smali is the assembly‑like representation of Dalvik bytecode. Each .smali file corresponds to a single class (or inner class) and begins with directives that declare the class name, superclass, and source file.
Basic type keywords in Smali map to Java primitive types:
V → void
Z → boolean
B → byte
S → short
C → char
I → int
J → long
F → float
D → double
Common Smali directives include:
.class – class name
.super – superclass
.source – source file name
.field – field declaration
.method – method start
.end method – method end
.line – line number
.annotation / .end annotation – annotations
Method‑call instructions are identified by the invoke- prefix (e.g., invoke-virtual, invoke-static, invoke-direct, invoke-super, invoke-interface).
The scanning algorithm for Smali files:
Read each Smali file line by line; the first three lines provide class metadata.
When a .method line is encountered, record the method start until .end method.
Within a method, capture .line directives to know source line numbers.
Identify lines beginning with invoke-; if the invoked method matches any configured keyword, record the occurrence with class name, method name, and line number.
In practice, the open‑source baksmali.jar tool converts .dex to Smali, after which the above rules are applied. This approach avoids the heavy dex → jar → java pipeline and reduces total scan time to roughly half a minute.
Conclusion
The tool provides a comprehensive view of privacy‑sensitive API usage in the final APK, works without modifying the build process, and outputs results as JSON and HTML for easy comparison and manual triage. Limitations include the inability to pinpoint the exact module or AAR that introduced a call and difficulty in reconstructing full call chains, which the author plans to address by combining compile‑time and runtime checks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
