Mobile Development 16 min read

How AI Coding Powered the End‑to‑End Development of NMLeak, an Android Native Memory Leak Detector

This article recounts the three‑week, single‑developer journey of building NMLeak—a 5,881‑line Android native memory‑leak detection library—by manually splitting the architecture, applying AI‑assisted module coding and spec‑driven design, and highlights which practices succeeded and where human oversight remains essential.

Baidu App Technology

Jun 23, 2026

How AI Coding Powered the End‑to‑End Development of NMLeak, an Android Native Memory Leak Detector

Project Overview

NMLeak is an Android native memory‑leak detection library that intercepts malloc/free/mmap via PLT Hook, records allocation data, and uses the system libmemunreachable for automatic leak detection and attribution. The project started from scratch for offline debug‑package testing and consists of 5,881 lines of C++/Java code.

Development Timeline

Week 1 (1.15‑1.20): Framework setup – The system was split into three initial modules: HookHelper (hook management), MemoryHooks (function proxies), and BackTracer (stack unwinding). Each module was handed to AI for implementation.

Week 2 (1.22‑1.30): Core capabilities & performance tuning – Added the most complex module, TriggerManager, after AI generated a 1,616‑line design document that passed review.

Week 3 (2.3‑2.6): Trigger mechanisms & system integration – Integrated the trigger logic and completed end‑to‑end testing.

Weeks 4‑5 (2.27‑3.27): Integration, release, and bug fixing – Deployed NMLeak to production and resolved remaining issues.

AI Coding Practices

Tools & models – The project used the comate platform with the strongest available coding model at the time. A living “project memory” document captured architecture, build steps, design decisions, and configuration parameters, which the AI consulted on every turn.

Workflow – The approach combined manual architecture decomposition with per‑module AI coding. The rationale was that module‑wise implementation allows incremental verification and risk containment while leveraging the developer’s deep domain knowledge for effective splitting.

Two AI coding modes

Vibe Coding : Fast, direct generation for well‑bounded modules. The developer describes the problem and constraints, leaving solution choices to the AI.

Spec‑driven : A detailed design specification (Spec) is written and reviewed before AI code generation, preventing context‑bloat and reducing later correction cycles.

Both modes start with a “Plan” dialogue where the developer states intent and constraints, the AI summarizes and confirms the plan, and only then proceeds to code or document generation.

Why model capability matters

The quality jump observed in the project stemmed largely from using a more capable base model. Stronger models reduced alignment cost, required fewer correction rounds, and were better at uncovering implicit constraints.

Example: Counting Bloom Filter

Problem: free() calls were too slow because each call queried a full hash table, even though most freed addresses were not being tracked.

Developer: “Free paths are slow; we check a hash table for every free, but most frees aren’t tracked. Can we filter out the invalid frees quickly?” AI: “Use a Counting Bloom Filter to quickly discard negatives while supporting deletions.”

The AI then produced a complete implementation:

class CountingBloomFilter {
    // 128K slots × 4 bytes = 512 KB fixed memory, O(1) query
    static constexpr size_t kSlotBits = 17;
    static constexpr size_t kSlotCount = 1 << kSlotBits;
    static constexpr size_t kSlotMask = kSlotCount - 1;
    static constexpr size_t kHashCount = 3;
    // ... implementation omitted for brevity
};

This code demonstrates the AI’s ability to select appropriate probabilistic data structures, apply MurmurHash3‑based double hashing, and ensure lock‑free concurrency with atomic counters.

Example: MessageQueue with swap‑based consumption

By describing only the problem and constraints (high‑frequency malloc/free, producer‑consumer threads, low latency, minimal blocking), the AI chose a bucketed hash queue with per‑bucket mutexes and a swap‑based consumer that performs a single pointer exchange, eliminating lock contention during data copy.

Bucketed hash + independent locks replace a global lock.

Each bucket stores only a MessageArray* pointer.

The consumer holds an empty array as a “swap token” to exchange pointers atomically.

These design choices directly stem from the constraint “hook path must be as short as possible”.

Spec‑driven design for complex modules

For modules like TriggerManager with multiple states, thresholds, and timeout handling, the team first authored a detailed Spec covering terminology, state‑transition diagrams, threshold formulas, configuration tables, and degradation strategies. AI then generated code that respected all documented edge cases, and future changes only required updating the Spec before re‑generation.

Areas for Improvement

From personal practice to team collaboration

Spec format inconsistency across contributors.

Spec‑code drift when code changes without Spec updates.

Proposed solutions include introducing a unified Spec template, a Spec review checklist in the code‑review workflow, version‑controlled Spec documents stored alongside code, and strict engineering policies that tie Spec changes to code changes.

Closing the feedback loop

The current process requires a human to manually build, deploy, test, and report results back to the AI, creating latency and description loss. Inspired by TDD, the suggestion is to let the human define correctness criteria (unit tests, integration tests, performance benchmarks) that the AI can execute autonomously, iterating until the criteria are met. Human effort shifts from per‑round debugging to one‑time definition of “what is correct”.

Conclusion

AI coding tools, spec‑driven design, and modular architecture together enabled a solo developer to deliver a production‑grade Android native memory‑leak detector in three weeks. While stronger models dramatically improve output quality, human oversight remains crucial for design alignment, edge‑case reasoning, and process orchestration. Future work should focus on scaling these practices to multi‑person teams and automating the verification feedback loop.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Native Android AI coding PLT Hook Memory Leak Detection Counting Bloom Filter spec-driven development

Written by

Baidu App Technology

Official Baidu App Tech Account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.