How HuoLala Built a Low‑Cost, High‑Reliability Mobile UI Automation Platform
This article details HuoLala's journey from a weekly release cycle to a cloud‑based record‑and‑replay mobile UI automation platform, covering background challenges, industry analysis, technical design—including deep‑learning based control detection, SIFT image matching, script generation, playback handling, and platform features—while demonstrating significant testing efficiency gains and future AI‑driven enhancements.
Background and Goal
With HuoLala’s rapid business growth, the weekly release cadence demanded faster, higher‑quality delivery, creating a pressing need to reduce the manual effort of regression testing for the mobile app.
Early attempts with Appium‑based UI automation faced high onboarding and maintenance costs, low script stability, and extensive debugging effort.
The objective was to create a low‑cost, highly available App UI automation platform that meets the following criteria:
Lower technical threshold : simple onboarding without environment setup.
Faster script authoring : generate executable scripts directly from on‑device actions.
Reduced maintenance : image‑based control detection to mitigate UI changes.
Higher stability : high playback recognition rate, less impact from pop‑ups and environment.
Rich platform features : script management, device scheduling, test reporting, etc.
Industry Solutions
Considering ROI, HuoLala evaluated record‑replay solutions from major companies:
NetEase Airtest – free IDE but limited platform features for large‑scale collaboration.
Meituan AlphaTest – SDK integration with deep hook capabilities, requiring close cooperation with mobile developers.
iQIYI – cloud device + cloud IDE, offering full platform features without SDK development.
ByteDance SmartEye – SDK‑based, focused on precise testing.
These analyses led to the decision to build a custom platform leveraging existing cloud‑device infrastructure.
Capability Construction
HuoLala already possessed two key strengths:
成熟的云真机平台 (mature cloud‑device platform).
深度的移动 App 质效实践 (extensive mobile testing services).
Thus, a cloud‑device‑driven record‑replay solution was chosen.
3.1 Recording Capability
The recording process captures raw operation events from cloud devices, parses screenshots and coordinates to identify UI controls, and converts them into script steps. It supports both Android and iOS, with side‑channel reporting that does not block device interaction.
Key recording goals:
Identify operation type (click, long‑press, input, swipe, etc.).
Identify target control (button, tab, text field, etc.).
3.1.1 Cloud‑Device Side‑Channel Reporting & Event Parsing
Device actions are captured as a stream of low‑level events:
// click
d 0 10 10 50
c
// long press
d 0 10 10 50
c
// swipe
d 0 0 0 50
c
// custom wait
<wait in your own code>These events allow the system to determine both the action type and the associated UI element.
3.1.2 Control / Text Detection
Deep‑learning based object detection (YOLOX) is used to locate UI controls (icons, images, text) in screenshots. A pre‑trained model is fine‑tuned on HuoLala‑specific UI data by freezing the backbone and retraining the head:
model = dict(backbone=dict(frozen_stages=1))3.1.3 Script Generation
All actions are serialized into a custom script format. For example, a click is represented as: Click() – if the target is an icon, the script stores the icon screenshot and relative coordinates; if it is text, the script records the text string.
3.2 Playback Capability
During playback, the stored control screenshots or text are matched against the target device screen to execute the recorded actions. Both image and text matching are employed.
3.2.1 Image Matching
Icon matching uses SIFT feature points with a region mask to focus on the control area and disables rotation invariance:
sift.detect(image, kpVector, mask);
for (int i = 0; i < kpVector.size(); i++) {
KeyPoint point = kpVector.get(i);
point.angle(0); // disable rotation
}
sift.compute(image, kpVector, ret);RegionMask filters out irrelevant features, improving robustness against color, resolution, and badge variations.
3.2.2 Text Matching
OCR (PaddleOCR) extracts text, which is then compared to expected strings. Edit distance tolerance handles OCR errors (≈80% accuracy) and punctuation differences.
// XPath partial match example
//*[contains(@text,'xxx')]3.2.3 Popup Handling
Two strategies are used:
DeviceOwner policy auto‑grant for system permission dialogs.
Whitelist‑based detection of known business pop‑ups, followed by automated dismissal and test continuation.
3.2.4 Automatic Package Installation
Device‑specific installation policies (e.g., OPPO/VIVO) are handled by an in‑cloud package‑assistant service instead of rooting the device.
3.2.5 Data Construction & Request Mock
Scripts can invoke a data‑factory service to generate test data and can integrate with an APP‑MOCK platform to stub API responses (e.g., AB‑test configs, push notifications).
3.3 Platform Features
3.3.1 Test Case Editing & Management
All UI scripts can be edited, debugged, and executed directly in a browser‑based IDE (Monaco Editor), eliminating the need for local environment setup.
3.3.2 Script Groups & Task Scheduling
Scripts are organized into groups with pre/post scripts and account configurations. Groups are dispatched as minimal execution units to multiple devices for parallel execution, drastically reducing total run time.
Effectiveness and Practice
4.1 Regression Testing Efficiency
A dedicated virtual team standardized UI testing practices, establishing guidelines for case selection, scenario design, data preparation, script manuals, and execution strategies. Targets included >90% playback success rate and total execution time <90 minutes for full‑suite runs.
Results: Over ten weekly releases have been fully covered, reducing manual regression effort and improving release confidence.
4.2 Overall Testing Efficiency Gains
Performance automation : UI‑based performance scripts now achieve ~100% pass rate, requiring minimal maintenance.
Deep compatibility testing : UI scripts are reused for extensive compatibility coverage across devices.
Data‑point automation : UI scripts trigger high‑value event tracking verification automatically.
CICD integration : Core UI regression cases are embedded in CI pipelines, providing immediate feedback on code changes.
Future Outlook
“道阻且长,行则将至,行而不辍,未来可期”。——《荀子·修身》
Planned enhancements include:
Iterative model refinement for higher precision and performance.
Complete data recording and playback, covering local configuration and cache control.
Exploration of large‑model vision capabilities to detect UI anomalies.
Integration with precise client testing to recommend uncovered scenarios and change‑related cases.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
