Mobile Development 17 min read

How to Detect and Analyze Android Thread Deadlocks with Automated Monitoring

This article describes the background of Android thread freeze issues caused by deadlocks, presents a client‑server monitoring architecture that captures thread and lock information via system traces, details automated analysis methods for deadlock detection and non‑deadlock causes, and shares the observed performance improvements and future plans.

Tencent TDS Service
Tencent TDS Service
Tencent TDS Service
How to Detect and Analyze Android Thread Deadlocks with Automated Monitoring

Problem Background

After each release of the mobile app, developers receive numerous user‑reported issues such as "unread messages not disappearing", "images not showing", and "spinning indicator never stops". Many of these stem from thread deadlocks that render features unusable, forcing users to kill the process and restart. With over 250 modules and 4 million lines of code, preventing deadlocks through coding standards or static analysis tools proved insufficient.

Solution Details

Overall Scheme Overview

The monitoring system consists of a client side and a backend side.

Client side includes a WatchThread that observes a target thread. If a message taken from the target thread’s Looper queue does not finish within a configurable timeout (default 3 minutes), the client records the thread’s held and waiting locks and reports this data to the backend.

Backend side runs an automated analysis tool that processes the reported data, identifies the cause of the freeze, and creates tickets for further handling.

Client Reporting

Freeze Information

The key information to report are the thread details and the locks it holds or waits for. In Java, only blocking locks can cause a freeze; these include synchronized, LockSupport, and Object locks.

For each lock type the required data are:

synchronized lock – holder thread and waiting thread

LockSupport lock – holder thread and waiting thread

Object lock – waiting thread only

Reporting Scheme 1: Capture Java Stack – Not Feasible

Attempting to extract lock information from a Java stack trace failed because the stack did not contain the lock details, making this approach unusable.

Reporting Scheme 2: Capture System traces.txt – Feasible

When an ANR occurs, Android sends a SIGQUIT signal to the process, which triggers the generation of /data/anr/traces.txt. This file contains thread stacks and lock information, allowing the client to report freeze data after forcing the file creation.

Reporting Difficulty: Traces Lack LockSupport Holder Info

Analysis showed that while synchronized and Object lock information appears in the trace, LockSupport lock holder threads are missing.

Solution: Actively Record LockSupport Thread Info

By adding instrumentation in the database‑related code, the system records when a thread acquires or releases a LockSupport lock, storing the thread ID and name. This extra information is appended to the end of the trace file before reporting.

Server‑Side Identification

Identification Scheme: Key‑Info Reporting + Automated Analysis

The backend receives three essential pieces of data: the full traces.txt, the manually recorded LockSupport info, and the identifier of the frozen thread. The analysis pipeline extracts lock ownership and waiting relationships, reconstructs lock graphs, and determines whether a deadlock exists.

The algorithm walks the lock graph, detects cycles (deadlocks), and otherwise classifies the freeze into categories such as network, file I/O, HashMap, IPC, GC, database, etc.

Deadlock Example

Two threads, MSF‑Receiver and QQ_DB, each hold one lock while waiting for the other, forming a lock‑list cycle that is identified as a deadlock.

Identification Difficulty 1: Different Addresses for the Same LockSupport Lock

Although the same logical LockSupport lock is used, different threads show different object addresses in the dump, preventing straightforward matching.

Solution: Extract Common Feature and Treat as Same Lock

By recognizing a common stack string such as "SQLiteConnectionPool.waitForConnection", the analysis injects a synthetic lock with a unified identifier, allowing the two addresses to be considered the same lock.

Identification Difficulty 2: Non‑Deadlock Issues

Non‑deadlock freezes are categorized by matching stack‑trace keywords to problem types such as network, file I/O, HashMap, IPC, GC, database, ProcessManager, and PB. The keyword‑to‑category map is continuously refined.

Freeze Monitoring and Automation Effect

Automated analysis on a sample day (Nov 7) produced an overview chart showing the distribution of freeze causes. Deadlocks accounted for 35.6 % and have been fully resolved; other issues such as IO, HashMap, and network have also been addressed, while some categories (IPC, ProcessManager, PB, GC, etc.) remain pending.

Overall thread‑freeze rates have decreased across versions (e.g., MSF thread freeze from 0.3 % to 0.1 %).

Future Plans

Remaining work includes automating ticket creation after analysis and extending LockSupport instrumentation to cover all usages beyond the database, thereby improving deadlock detection coverage.

AndroidDeadlockthread monitoring
Tencent TDS Service
Written by

Tencent TDS Service

TDS Service offers client and web front‑end developers and operators an intelligent low‑code platform, cross‑platform development framework, universal release platform, runtime container engine, monitoring and analysis platform, and a security‑privacy compliance suite.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.