Mobile Development 22 min read

How to Build a Complete ANR Monitoring Solution on Android

This article explains the Android ANR workflow, analyzes the system's appNotResponding logic, and presents a robust monitoring strategy that captures SIGQUIT signals, validates true ANR events, and hooks trace writes to reliably detect and diagnose ANRs in mobile apps.

WeChat Client Technology Team
WeChat Client Technology Team
WeChat Client Technology Team
How to Build a Complete ANR Monitoring Solution on Android

1. ANR Process

ANR handling is performed in the system_server process; the key method is ProcessRecord.appNotResponding in

frameworks/base/services/core/java/com/android/server/am/ProcessRecord.java

. The method first checks several extreme cases (shutdown, crashing, killed, duplicate ANR) and returns early, then marks the process as not responding and dumps stack traces of selected processes.

void appNotResponding(String activityShortComponentName, ApplicationInfo aInfo,
    String parentShortComponentName, WindowProcessController parentProcess,
    boolean aboveSystem, String annotation, boolean onlyDumpSelf) {
    //......
    final boolean isSilentAnr;
    synchronized (mService) {
        if (mService.mAtmInternal.isShuttingDown()) {
            Slog.i(TAG, "During shutdown skipping ANR: " + this + " " + annotation);
            return;
        } else if (isNotResponding()) {
            Slog.i(TAG, "Skipping duplicate ANR: " + this + " " + annotation);
            return;
        } else if (isCrashing()) {
            Slog.i(TAG, "Crashing app skipping ANR: " + this + " " + annotation);
            return;
        } else if (killedByAm) {
            Slog.i(TAG, "App already killed by AM skipping ANR: " + this + " " + annotation);
            return;
        } else if (killed) {
            Slog.i(TAG, "Skipping died app ANR: " + this + " " + annotation);
            return;
        }
        setNotResponding(true);
        EventLog.writeEvent(EventLogTags.AM_ANR, userId, pid, processName, info.flags,
                annotation);
        // Dump thread traces as quickly as we can, starting with "interesting" processes.
        firstPids.add(pid);
        isSilentAnr = isSilentAnr();
        //......
    }
}

The method distinguishes foreground and background (silent) ANRs: foreground ANRs show a dialog, while silent ANRs kill the process. The decision is based on whether the user can perceive the process.

2. Monitoring SIGQUIT Signal

Android sends a SIGQUIT (signal 3) to the process that is about to dump its stack. Capturing this signal provides a hook for ANR detection. Two ways are explored:

2.1 sigwait

Creates a dedicated thread that blocks on sigwait for SIGQUIT.

static void* mySigQuitCatcher(void* args) {
    while (true) {
        int sig;
        sigset_t sigSet;
        sigemptyset(&sigSet);
        sigaddset(&sigSet, SIGQUIT);
        sigwait(&sigSet, &sig);
        if (sig == SIGQUIT) {
            // Got SIGQUIT
        }
    }
}
pthread_t pid;
pthread_create(&pid, nullptr, mySigQuitCatcher, nullptr);
pthread_detach(pid);

When both a custom sigwait thread and the system SignalCatcher exist, the signal may be delivered to either thread unpredictably.

2.2 Signal Handler

Registers a handler with sigaction. Because SIGQUIT is blocked by default, the handler does not receive the signal until it is unblocked with pthread_sigmask or sigprocmask.

void signalHandler(int sig, siginfo_t* info, void* uc) {
    if (sig == SIGQUIT) {
        // Got an ANR
    }
}

struct sigaction sa;
sa.sa_sigaction = signalHandler;
sa.sa_flags = SA_ONSTACK | SA_SIGINFO | SA_RESTART;
sigaction(SIGQUIT, &sa, nullptr);

sigset_t sigSet;
sigemptyset(&sigSet);
sigaddset(&sigSet, SIGQUIT);
pthread_sigmask(SIG_UNBLOCK, &sigSet, nullptr);

After handling the signal, the handler must forward SIGQUIT back to the original SignalCatcher thread, otherwise the system dump will time out.

int tid = getSignalCatcherThreadId(); // find SignalCatcher thread tid
tgkill(getpid(), tid, SIGQUIT);

3. Complete ANR Monitoring Scheme

Simply receiving SIGQUIT is insufficient; false positives occur when other processes trigger ANR or when SIGQUIT is sent manually. The solution checks the NOT_RESPONDING flag via ActivityManager.getProcessesInErrorState() within 20 seconds of the signal.

private static boolean checkErrorState() {
    try {
        Application application = sApplication == null ? Matrix.with().getApplication() : sApplication;
        ActivityManager am = (ActivityManager) application.getSystemService(Context.ACTIVITY_SERVICE);
        List<ActivityManager.ProcessErrorStateInfo> procs = am.getProcessesInErrorState();
        if (procs == null) return false;
        for (ActivityManager.ProcessErrorStateInfo proc : procs) {
            if (proc.pid != android.os.Process.myPid()) continue;
            if (proc.condition != ActivityManager.ProcessErrorStateInfo.NOT_RESPONDING) continue;
            return true;
        }
        return false;
    } catch (Throwable t) {
        MatrixLog.e(TAG, "[checkErrorState] error : %s", t.getMessage());
    }
    return false;
}

Additional safeguards use the siginfo_t fields to ignore signals sent by the same process.

To catch silent ANRs and crash‑ANRs that do not set the flag, the main‑thread stall is detected by reflecting the MessageQueue.mMessages object and comparing its when timestamp with the current time.

private static boolean isMainThreadStuck(){
    try {
        MessageQueue mainQueue = Looper.getMainLooper().getQueue();
        Field field = mainQueue.getClass().getDeclaredField("mMessages");
        field.setAccessible(true);
        final Message mMessage = (Message) field.get(mainQueue);
        if (mMessage != null) {
            long when = mMessage.getWhen();
            if (when == 0) return false;
            long time = when - SystemClock.uptimeMillis();
            long timeThreshold = BACKGROUND_MSG_THRESHOLD;
            if (foreground) {
                timeThreshold = FOREGROUND_MSG_THRESHOLD;
            }
            return time < timeThreshold;
        }
    } catch (Exception e){
        return false;
    }
    return false;
}

Finally, the ANR trace file can be intercepted by PLT‑hooking the connect, open, and write functions of the appropriate libraries for the device’s API level.

int (*original_connect)(int __fd, const struct sockaddr* __addr, socklen_t __addr_length);
int my_connect(int __fd, const struct sockaddr* __addr, socklen_t __addr_length) {
    if (strcmp(__addr->sa_data, "/dev/socket/tombstoned_java_trace") == 0) {
        isTraceWrite = true;
        signalCatcherTid = gettid();
    }
    return original_connect(__fd, __addr, __addr_length);
}

int (*original_open)(const char *pathname, int flags, mode_t mode);
int my_open(const char *pathname, int flags, mode_t mode) {
    if (strcmp(pathname, "/data/anr/traces.txt") == 0) {
        isTraceWrite = true;
        signalCatcherTid = gettid();
    }
    return original_open(pathname, flags, mode);
}

ssize_t (*original_write)(int fd, const void* const __pass_object_size0 buf, size_t count);
ssize_t my_write(int fd, const void* const buf, size_t count) {
    if(isTraceWrite && signalCatcherTid == gettid()) {
        isTraceWrite = false;
        signalCatcherTid = 0;
        char *content = (char *) buf;
        printAnrTrace(content);
    }
    return original_write(fd, buf, count);
}

void hookAnrTraceWrite() {
    int apiLevel = getApiLevel();
    if (apiLevel < 19) return;
    if (apiLevel >= 27) {
        plt_hook("libcutils.so", "connect", (void *)my_connect, (void **)(&original_connect));
    } else {
        plt_hook("libart.so", "open", (void *)my_open, (void **)(&original_open));
    }
    if (apiLevel >= 30 || apiLevel == 25 || apiLevel == 24) {
        plt_hook("libc.so", "write", (void *)my_write, (void **)(&original_write));
    } else if (apiLevel == 29) {
        plt_hook("libbase.so", "write", (void *)my_write, (void **)(&original_write));
    } else {
        plt_hook("libart.so", "write", (void *)my_write, (void **)(&original_write));
    }
}

The hooks are enabled only after SIGQUIT is received and disabled after the dump finishes, minimizing impact. The complete solution has been deployed in the WeChat Android client for over a year and is open‑sourced in the Matrix project.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AndroidPerformance MonitoringANRsignal-handlingnative hooking
WeChat Client Technology Team
Written by

WeChat Client Technology Team

Official account of the WeChat mobile client development team, sharing development experience, cutting‑edge tech, and little‑known stories across Android, iOS, macOS, Windows Phone, and Windows.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.