Mobile Development 22 min read

APK Resource Analysis and Optimization Using Python

This article explains how to use Python to analyze Android APK packages, extract basic statistics, identify optimizable resources such as oversized images, duplicate files, and unused assets, and provide data‑driven guidance for reducing APK size and improving distribution efficiency.

JD Retail Technology
JD Retail Technology
JD Retail Technology
APK Resource Analysis and Optimization Using Python

Background – Rapid feature growth in the JD.com main app has caused the APK size to increase dramatically, leading to higher promotion costs, lower user download willingness, and exceeding Google Play’s 100 MB limit. The article describes a Python‑based approach to analyze APKs, gather basic data, and pinpoint optimization opportunities.

APK File Structure – An APK is a zip archive; using aapt l file.apk lists its contents. The article shows typical directories (e.g., res/, assets/, lib/, src/) and notes that Java resources are also packaged.

Primary Analysis Tasks

Download APK and mapping files.

Use AAPT to retrieve package information.

Obtain file system size ( apk_file_size) and compressed size ( apk_download_size).

Restore obfuscated resource IDs.

Detect duplicate resources via MD5.

Read DEX header to get class and method counts.

Identify non‑alpha PNG images larger than 10 KB.

Extract .so files with ZipFile and analyze them.

Detect unused resources under res/.

All steps are implemented in Python; downloading and extracting the APK is performed with resumable download logic (omitted for brevity).

3.1 AAPT Retrieve APK Information

<span>def get_apk_base_info(self):</span>
<span># Get basic APK info</span>
<span>p = subprocess.Popen(self.aapt_path + " dump badging %s" % self.apkPath, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, shell=True)</span>
<span>(output, err) = p.communicate()</span>
<span>package_match = re.compile("package: name='(\S+)' versionCode='(\d+)' versionName='(\S+)'" ).match(output.decode())</span>
<span>if not package_match:</span>
<span>    raise Exception("can't get package,versioncode,version")</span>
<span>package_name = package_match.group(1)</span>
<span>version_code = package_match.group(2)</span>
<span>version_name = package_match.group(3)</span>
<span>launch_activity_match = re.compile("launchable-activity: name='(\S+)'" ).search(output.decode())</span>
<span>if not launch_activity_match:</span>
<span>    raise Exception("can't get launch_activity")</span>
<span>launch_activity = launch_activity_match.group(1)</span>
<span>sdk_version_match = re.compile("sdkVersion:'(\S+)'" ).search(output.decode())</span>
<span>if not sdk_version_match:</span>
<span>    raise Exception("can't get min_sdk_version")</span>
<span>min_sdk_version = sdk_version_match.group(1)</span>
<span>target_sdk_version_match = re.compile("targetSdkVersion:'(\S+)'" ).search(output.decode())</span>
<span>if not target_sdk_version_match:</span>
<span>    raise Exception("can't get target_sdk_version")</span>
<span>target_sdk_version = target_sdk_version_match.group(1)</span>
<span>application_label_match = re.compile("application-label:'([\u4e00-\u9fa5_a-zA-Z0-9-\S]+)'" ).search(output.decode())</span>
<span>if not application_label_match:</span>
<span>    raise Exception("can't get application_label")</span>
<span>application_label = application_label_match.group(1)</span>
<span>return package_name, version_name, version_code, launch_activity, min_sdk_version, target_sdk_version, application_label</span>

3.2 apk_file_size & apk_download_size

<span>def get_apk_size(self):</span>
<span># Get APK file size on disk</span>
<span>size = round(os.path.getsize(self.apkPath) / (1024 * 1000), 2)</span>
<span># return str(size) + "M"</span>
<span>return os.path.getsize(self.apkPath)</span>

<span>def get_apk_download_size(apk_file_name):</span>
<span># Get compressed size of the APK</span>
<span>zip_file = zipfile.ZipFile(apk_file_name, 'r')</span>
<span>zip_infos = zip_file.infolist()</span>
<span>download_size = 0</span>
<span>for index in range(len(zip_infos)):</span>
<span>    zip_info = zip_infos[index]</span>
<span>    download_size += zip_info.compress_size</span>
<span>return download_size</span>

3.3 ZipFile Read APK Files

<span>def __get_files_from_apk(apk_file_name, apk_name_without_suffix, mapping_name_without_suffix):</span>
<span># Read obfuscation mapping</span>
<span>proguard_map = reproguard.read_proguard_apk(mapping_name_without_suffix)</span>
<span>zip_file = zipfile.ZipFile(apk_file_name, 'r')</span>
<span>file_name_list = zip_file.namelist()</span>
<span>for index in range(len(file_name_list)):</span>
<span>    file_name = str(file_name_list[index])</span>
<span>    if proguard_map:</span>
<span>        entry_name = str(reproguard.replace_path_id(file_name, proguard_map)) if ("/" in file_name) else file_name</span>
<span>    else:</span>
<span>        entry_name = file_name</span>
<span>    md5_str = md5.get_md5_value(file_name)</span>
<span>    zip_info = zip_file.getinfo(file_name)</span>
<span>    file_info = FileInfo(path=file_name, entry_name=entry_name, md5_str=md5_str, compress_size=zip_info.compress_size, file_type=file_type, zip_file=zip_info)</span>
<span>    # Further processing for .so, React Native, dex, images, etc.</span>
<span>zip_file.close()</span>
<span>return apk_file_list, aura_bundles, dex_files, react_modules</span>

3.4 Parse DEX Header

<span>def ReadDexHeader_(self, file_dir):</span>
<span># Read DEX file in binary mode</span>
<span>f = open(file_dir, 'rb')</span>
<span>m = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)</span>
<span>self.mmap = m</span>
<span># Extract header fields</span>
<span>string_ids_size = struct.unpack('<L', m[0x38:0x3C])[0]</span>
<span>string_ids_off = struct.unpack('<L', m[0x3C:0x40])[0]</span>
<span>type_ids_size = struct.unpack('<L', m[0x40:0x44])[0]</span>
<span>type_ids_off = struct.unpack('<L', m[0x44:0x48])[0]</span>
<span>proto_ids_size = struct.unpack('<L', m[0x48:0x4C])[0]</span>
<span>proto_ids_off = struct.unpack('<L', m[0x4C:0x50])[0]</span>
<span>field_ids_size = struct.unpack('<L', m[0x50:0x54])[0]</span>
<span>field_ids_off = struct.unpack('<L', m[0x54:0x58])[0]</span>
<span>method_ids_size = struct.unpack('<L', m[0x58:0x5C])[0]</span>
<span>method_ids_off = struct.unpack('<L', m[0x5C:0x60])[0]</span>
<span>class_defs_size = struct.unpack('<L', m[0x60:0x64])[0]</span>
<span>class_defs_off = struct.unpack('<L', m[0x64:0x68])[0]</span>
<span>data_size = struct.unpack('<L', m[0x68:0x6C])[0]</span>
<span>data_off = struct.unpack('<L', m[0x6C:0x70])[0]</span>
<span>header_data = {</span>
<span>    'string_ids_size': string_ids_size,</span>
<span>    'string_ids_off': string_ids_off,</span>
<span>    'type_ids_size': type_ids_size,</span>
<span>    'type_ids_off': type_ids_off,</span>
<span>    'proto_ids_size': proto_ids_size,</span>
<span>    'proto_ids_off': proto_ids_off,</span>
<span>    'field_ids_size': field_ids_size,</span>
<span>    'field_ids_off': field_ids_off,</span>
<span>    'method_ids_size': method_ids_size,</span>
<span>    'method_ids_off': method_ids_off,</span>
<span>    'class_defs_size': class_defs_size,</span>
<span>    'class_defs_off': class_defs_off,</span>
<span>    'data_size': data_size,</span>
<span>    'data_off': data_off</span>
<span>}</span>
<span>self.header = header_data</span>

3.5 Identify Non‑Alpha PNG Images

<span>from PIL import Image</span>
<span>try:</span>
<span>    image_bytes = io.BytesIO(zip_file.read(file_name))</span>
<span>    img = Image.open(image_bytes)</span>
<span>    image_size = img.size  # (width, height)</span>
<span>    if img.mode != "RGBA":</span>
<span>        if image_type == ".png" and not filename_without_suffix.endswith(".9") and zip_info.compress_size >= 10*1024:</span>
<span>            non_alpha = True</span>
<span>except OSError:</span>
<span>    pass</span>
<span>finally:</span>
<span>    file_info.image_size = image_size</span>
<span>    file_info.non_alpha = non_alpha</span>
<span>    apk_file_list.append(file_info)</span>
<span>    continue</span>

3.6 Duplicate Resources – Duplicate files are detected by comparing MD5 hashes; identical hashes indicate redundant assets that can be deduplicated.

3.7 Unused Resources

Unused resources include files in res/ and assets/ that are not referenced by the compiled R.java, XML layouts, the AndroidManifest, or DEX code. The analysis proceeds in two parts:

3.7.1 Unused res/ Resources – Parse R.txt to obtain all resource IDs, analyze resources.arsc for actual references, scan XML files for value and non‑value references, and examine DEX/SMALI code for direct resource usage. The sets are merged, and any IDs not present in the merged reference set are considered unused.

3.7.2 Unused assets/ Resources – List all files under assets/, then search SMALI code for string literals that reference those assets; files not referenced are marked as unused.

Key code snippets for these steps are provided in the original article (e.g., read_resource_txt_file, read_smali_files, decode_resources, find_asset_file).

Conclusion – By leveraging Python to automate APK resource analysis, developers obtain precise metrics for image size, duplicate detection, DEX method counts, and unused assets, enabling effective APK slimming, reduced distribution costs, and improved user conversion rates.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Mobile DevelopmentPythonResource OptimizationAPKimage compressionDEX
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.