Mobile Development 21 min read

How HuoLala Built a Low‑Cost, High‑Reliability Mobile UI Automation Platform

This article details HuoLala's journey from a weekly release cycle to a cloud‑based record‑and‑replay mobile UI automation platform, covering background challenges, industry analysis, technical design—including deep‑learning based control detection, SIFT image matching, script generation, playback handling, and platform features—while demonstrating significant testing efficiency gains and future AI‑driven enhancements.

Huolala Tech

Nov 28, 2023

How HuoLala Built a Low‑Cost, High‑Reliability Mobile UI Automation Platform

Background and Goal

With HuoLala’s rapid business growth, the weekly release cadence demanded faster, higher‑quality delivery, creating a pressing need to reduce the manual effort of regression testing for the mobile app.

Early attempts with Appium‑based UI automation faced high onboarding and maintenance costs, low script stability, and extensive debugging effort.

The objective was to create a low‑cost, highly available App UI automation platform that meets the following criteria:

Lower technical threshold : simple onboarding without environment setup.

Faster script authoring : generate executable scripts directly from on‑device actions.

Reduced maintenance : image‑based control detection to mitigate UI changes.

Higher stability : high playback recognition rate, less impact from pop‑ups and environment.

Rich platform features : script management, device scheduling, test reporting, etc.

Industry Solutions

Considering ROI, HuoLala evaluated record‑replay solutions from major companies:

NetEase Airtest – free IDE but limited platform features for large‑scale collaboration.

Meituan AlphaTest – SDK integration with deep hook capabilities, requiring close cooperation with mobile developers.

iQIYI – cloud device + cloud IDE, offering full platform features without SDK development.

ByteDance SmartEye – SDK‑based, focused on precise testing.

These analyses led to the decision to build a custom platform leveraging existing cloud‑device infrastructure.

Capability Construction

HuoLala already possessed two key strengths:

成熟的云真机平台 (mature cloud‑device platform).

深度的移动 App 质效实践 (extensive mobile testing services).

Thus, a cloud‑device‑driven record‑replay solution was chosen.

3.1 Recording Capability

The recording process captures raw operation events from cloud devices, parses screenshots and coordinates to identify UI controls, and converts them into script steps. It supports both Android and iOS, with side‑channel reporting that does not block device interaction.

Key recording goals:

Identify operation type (click, long‑press, input, swipe, etc.).

Identify target control (button, tab, text field, etc.).

3.1.1 Cloud‑Device Side‑Channel Reporting & Event Parsing

Device actions are captured as a stream of low‑level events:

// click
 d 0 10 10 50
 c
 // long press
 d 0 10 10 50
 c
 // swipe
 d 0 0 0 50
 c
 // custom wait
 <wait in your own code>

These events allow the system to determine both the action type and the associated UI element.

3.1.2 Control / Text Detection

Deep‑learning based object detection (YOLOX) is used to locate UI controls (icons, images, text) in screenshots. A pre‑trained model is fine‑tuned on HuoLala‑specific UI data by freezing the backbone and retraining the head:

model = dict(backbone=dict(frozen_stages=1))

3.1.3 Script Generation

All actions are serialized into a custom script format. For example, a click is represented as: Click() – if the target is an icon, the script stores the icon screenshot and relative coordinates; if it is text, the script records the text string.

3.2 Playback Capability

During playback, the stored control screenshots or text are matched against the target device screen to execute the recorded actions. Both image and text matching are employed.

3.2.1 Image Matching

Icon matching uses SIFT feature points with a region mask to focus on the control area and disables rotation invariance:

sift.detect(image, kpVector, mask);
  for (int i = 0; i < kpVector.size(); i++) {
      KeyPoint point = kpVector.get(i);
      point.angle(0); // disable rotation
  }
  sift.compute(image, kpVector, ret);

RegionMask filters out irrelevant features, improving robustness against color, resolution, and badge variations.

3.2.2 Text Matching

OCR (PaddleOCR) extracts text, which is then compared to expected strings. Edit distance tolerance handles OCR errors (≈80% accuracy) and punctuation differences.

// XPath partial match example
//*[contains(@text,'xxx')]

3.2.3 Popup Handling

Two strategies are used:

DeviceOwner policy auto‑grant for system permission dialogs.

Whitelist‑based detection of known business pop‑ups, followed by automated dismissal and test continuation.

3.2.4 Automatic Package Installation

Device‑specific installation policies (e.g., OPPO/VIVO) are handled by an in‑cloud package‑assistant service instead of rooting the device.

3.2.5 Data Construction & Request Mock

Scripts can invoke a data‑factory service to generate test data and can integrate with an APP‑MOCK platform to stub API responses (e.g., AB‑test configs, push notifications).

3.3 Platform Features

3.3.1 Test Case Editing & Management

All UI scripts can be edited, debugged, and executed directly in a browser‑based IDE (Monaco Editor), eliminating the need for local environment setup.

3.3.2 Script Groups & Task Scheduling

Scripts are organized into groups with pre/post scripts and account configurations. Groups are dispatched as minimal execution units to multiple devices for parallel execution, drastically reducing total run time.

Effectiveness and Practice

4.1 Regression Testing Efficiency

A dedicated virtual team standardized UI testing practices, establishing guidelines for case selection, scenario design, data preparation, script manuals, and execution strategies. Targets included >90% playback success rate and total execution time <90 minutes for full‑suite runs.

Results: Over ten weekly releases have been fully covered, reducing manual regression effort and improving release confidence.

4.2 Overall Testing Efficiency Gains

Performance automation : UI‑based performance scripts now achieve ~100% pass rate, requiring minimal maintenance.

Deep compatibility testing : UI scripts are reused for extensive compatibility coverage across devices.

Data‑point automation : UI scripts trigger high‑value event tracking verification automatically.

CICD integration : Core UI regression cases are embedded in CI pipelines, providing immediate feedback on code changes.

Future Outlook

“道阻且长，行则将至，行而不辍，未来可期”。——《荀子·修身》

Planned enhancements include:

Iterative model refinement for higher precision and performance.

Complete data recording and playback, covering local configuration and cache control.

Exploration of large‑model vision capabilities to detect UI anomalies.

Integration with precise client testing to recommend uncovered scenarios and change‑related cases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ci/cd Deep Learning UI automation mobile testing SIFT cloud devices record-replay

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.