How to Make AI Precisely Operate Mobile Apps: Solving Common Midscene.js Testing Pain Points
This article dissects the practical challenges of using Midscene.js for Android UI automation, demonstrates why auto‑planning can fail, and provides concrete step‑by‑step solutions—including instant operation APIs, assertion checks, refined prompts, coordinate clicks, conditional scrolling, and smart waiting—to make AI‑driven mobile testing reliable and efficient.
Midscene.js provides AI‑driven automation for Android apps but practical use reveals several reliability and performance challenges. The following patterns and mitigations address these issues.
1. Limits of Auto‑Planning
Problem : Using agent.ai() to set gender to female stops after selecting the option because the AI never taps the filter’s confirm button.
await agent.aiTap('左上角“我的/会员”');
await agent.aiTap('左上角的用户头像');
await agent.ai('设置性别为女');Root cause : The auto‑planning API ( agent.aiAction()) depends on the model’s planning ability, which can be unstable.
Mitigation : Replace the planning call with explicit instant‑operation steps, eliminating the planning stage and improving reliability.
// replace await agent.ai('设置性别为女');
await agent.aiTap('性别选项的"保密"');
await agent.aiTap('性别选项弹窗的选项"女"');
await agent.aiTap('确认');2. Natural‑Language Ambiguity
Problem : A script asks the AI to tap a button labeled “进入菜单”. The UI actually shows “去结算”. The AI treats them as equivalent, taps the wrong button, and reports success, creating a false‑positive.
await agent.aiTap('页面右下角名称为 "进入菜单" 按钮');Root cause : Midscene.js element‑locating prompts ask the model to return coordinates of the element that best matches the description, so ambiguous wording can lead to an incorrect but similar match.
Mitigation : Insert an explicit assertion before the tap using aiAssert, which forces a true/false verification of the element’s existence.
await agent.aiAssert('页面中存在名称为 "进入菜单" 按钮');
await agent.aiTap('页面右下角名称为 "进入菜单" 按钮');3. Complex Widgets
Problem : In a date‑time picker the date column and time column are separate. The command ai('外带时间选择"后天9/28"') makes the AI click the time "11:30" on the same row instead of the intended date.
Mitigation : Use the instant‑operation API with a fine‑grained prompt that explicitly names the target column.
await agent.aiTap('外带时间筛选器左边的日期列中的"后天9/28"');4. Single‑Step Performance Optimization
When an element moves during a long wait (e.g., a product button takes 8 seconds to locate and the list scrolls away), a coordinate‑based click can avoid the race condition.
Solution : Extend AndroidDevice with a raw ADB‑style click method and invoke it with hard‑coded coordinates.
class ExtendedAndroidDevice extends AndroidDevice {
async clickAtCoordinate(x: number, y: number): Promise<void> {
await (this as any).mouseClick(x, y);
}
}
await page.clickAtCoordinate(268, 532);Note: Hard‑coded coordinates improve speed but reduce maintainability; use sparingly.
5. Conditional Scrolling
Midscene.js offers four scroll directions and two scroll types (to edge or fixed pixel length). Tests sometimes need to scroll until a target appears, such as scrolling a time picker until "14:30" is visible.
Implementation : Combine aiBoolean (to check presence) with aiScroll inside a loop.
while (await agent.aiBoolean('外带时间选择器中未出现14:30')) {
await agent.aiScroll(
{ direction: 'up', distance: 50, scrollType: 'once' },
'选择外带时间选择器右边的时间列'
);
}6. Smart Waiting and Popup Handling
Random page delays and unexpected pop‑ups cause flaky UI tests. Midscene.js provides two complementary approaches.
Global context (auto‑planning only) : Set a context at agent creation to auto‑handle privacy‑policy pop‑ups.
const agent = new AndroidAgent(page, {
aiActionContext: '如果出现任何有关隐私政策提示的弹窗,点击同意'
});Step‑level handling (instant‑operation) : Detect a pop‑up with aiBoolean and tap the "同意" button.
if (await agent.aiBoolean('页面出现有关隐私政策提示的弹窗')) {
await agent.aiTap('"同意"按钮');
}
await agent.aiTap('页面右下角名称为"进入菜单"按钮');For generic page‑load synchronization, use aiWaitFor:
await agent.aiWaitFor('商品详情页加载完成');Conclusion
Effective use of Midscene.js requires selecting the appropriate API per scenario:
Prefer instant‑operation calls ( aiTap, aiAssert) for precise, reliable actions.
Craft detailed prompts for complex widgets to avoid mis‑selection.
When latency matters, fall back to raw coordinate clicks via an extended AndroidDevice.
Implement conditional loops with aiBoolean + aiScroll for dynamic scrolling.
Use aiWaitFor and either global context or step‑level pop‑up detection to improve stability.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
