How to Build a Static Android APK Scanner for Sensitive API Detection

This article explains how to create a static analysis tool that scans Android APKs for privacy‑sensitive API calls, covering both compile‑time bytecode instrumentation and runtime stack tracing, with step‑by‑step instructions, code examples, configuration details, and optimization techniques.

NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
How to Build a Static Android APK Scanner for Sensitive API Detection

Tool Overview

The author developed a static analysis tool for Android APKs that detects calls to privacy‑sensitive APIs (e.g., OAID, Android ID, location) both at build time and during runtime, aiming to improve compliance checks for Google Play and other app stores.

Runtime Detection

To capture when a target method is invoked, a Gradle transform plugin injects a logging statement that prints the call stack using android.util.Log. The injection is performed with Javassist, which modifies the bytecode of the identified method.

method.insertBefore("android.util.Log.e(\"Privacy method call stack\", android.util.Log.getStackTraceString(new Throwable()));")

Product Scanning

The scanning process works on the final APK package. The workflow is:

Unzip the APK and extract the .dex files.

Convert the .dex files to a .jar using dex2jar.

Decompile the .jar to Java source files.

Line‑by‑line scan the Java files for predefined sensitive API keywords.

Two JSON configuration files drive the scan:

gp.json – rules for Google Play compliance.

privacy.json – rules for general privacy compliance.

Each file contains keys (the keywords to search) and filterPackages (package names to ignore, such as the app’s own code, to reduce noise).

Optimizations

Runtime detection originally could not monitor system APIs because they are not packaged; the solution is to instrument every class and JAR and look for system‑API calls using Javassist’s CtMethod and instrument capabilities.

Product scanning was slow (3–5 minutes) due to the jar → java conversion; switching to direct .dex → smali decompilation with baksmali.jar cuts the time to about 30 seconds.

Decompilation failures caused missing results; handling both Java and Kotlin bytecode mitigates this.

Previous line‑by‑line scans also captured irrelevant tokens such as field names and imports; the new scanner filters out non‑method tokens.

Smali‑Based Scanning

Smali is the assembly‑like representation of Dalvik bytecode. Each .smali file corresponds to a single class (or inner class) and begins with directives that declare the class name, superclass, and source file.

Basic type keywords in Smali map to Java primitive types:

V → void

Z → boolean

B → byte

S → short

C → char

I → int

J → long

F → float

D → double

Common Smali directives include:

.class – class name

.super – superclass

.source – source file name

.field – field declaration

.method – method start

.end method – method end

.line – line number

.annotation / .end annotation – annotations

Method‑call instructions are identified by the invoke- prefix (e.g., invoke-virtual, invoke-static, invoke-direct, invoke-super, invoke-interface).

The scanning algorithm for Smali files:

Read each Smali file line by line; the first three lines provide class metadata.

When a .method line is encountered, record the method start until .end method.

Within a method, capture .line directives to know source line numbers.

Identify lines beginning with invoke-; if the invoked method matches any configured keyword, record the occurrence with class name, method name, and line number.

In practice, the open‑source baksmali.jar tool converts .dex to Smali, after which the above rules are applied. This approach avoids the heavy dex → jar → java pipeline and reduces total scan time to roughly half a minute.

Conclusion

The tool provides a comprehensive view of privacy‑sensitive API usage in the final APK, works without modifying the build process, and outputs results as JSON and HTML for easy comparison and manual triage. Limitations include the inability to pinpoint the exact module or AAR that introduced a call and difficulty in reconstructing full call chains, which the author plans to address by combining compile‑time and runtime checks.

Tool workflow diagram
Tool workflow diagram
Configuration example
Configuration example
JSON config files
JSON config files
Runtime detection flow
Runtime detection flow
Sample scan result
Sample scan result
Another scan result
Another scan result
Optimized runtime flow
Optimized runtime flow
Smali scanning flow
Smali scanning flow
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Androidstatic analysisSmaliJavassistprivacy complianceAPK Scanning
NetEase Cloud Music Tech Team
Written by

NetEase Cloud Music Tech Team

Official account of NetEase Cloud Music Tech Team

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.