From Zero to a Universal Android Script Testing Solution: Mixed‑Script Automation, Image‑Recognition, and Recording Tools
The article details how Baidu MTC designed and implemented a universal Android script testing platform that combines UIAutomator, a custom Clean‑SDK for popup handling, image‑recognition algorithms, and a recording‑playback tool to enable robust, non‑native mobile automated testing across thousands of devices.
During the QCon side‑event in Shanghai, Baidu MTC senior test engineer Hong Zhi‑yuan presented a comprehensive Android automation testing solution that attracted a full house of 150 developers.
The motivation for automation testing is threefold: shortening test cycles, standardising results to avoid human error, and increasing coverage across the fragmented Android device landscape.
Traditional frameworks (UiAutomator, Robotium, Appium) each have drawbacks, leading the team to adopt a hybrid approach that mixes UIAutomator with a custom Clean‑SDK for cross‑process popup handling.
The Clean‑SDK detects top‑level UI nodes to discover and clear both in‑app and system popups, using text‑matching rather than element‑based heuristics to cope with diverse ROM customisations.
Non‑native scenarios such as WebView, games, and PopupWindow present additional challenges; the team evaluated element‑based, image‑recognition, and click‑path strategies, ultimately favouring a non‑intrusive image‑recognition pipeline.
The image‑recognition solution integrates template matching, feature‑point algorithms, and OCR, achieving around 99% matching accuracy and supporting a wide range of resolutions and device types.
By combining the image‑recognition pipeline with UIAutomator on the PC side, the team created a mixed‑script testing framework that offers stability, ease of use, and high accuracy.
A recording‑playback tool was built on top of this framework, capturing UI interactions via adb/minicap, generating scripts based on click‑path information, and supporting both coordinate‑ and element‑based generation methods.
The overall architecture consists of a PC‑hosted testing API, UIAutomator agents on devices, and image‑recognition services, providing a seamless experience for testers who can write a single script to handle native, WebView, game, and popup scenarios.
All components are open for learning purposes only and are not intended for commercial use.
Baidu Intelligent Testing
Welcome to follow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.