How AI is Transforming Automation: From Scripts to Intelligent Systems
Amid the digital transformation wave, automation has evolved from simple scripted tasks to AI-powered intelligent systems, offering adaptive decision-making, dynamic workflow generation, and robust handling of UI changes, with tools like Playwright and MidScene.js enabling both web and mobile automation through advanced AI integration.
Evolution and Current State of Automation Technology
In the wave of digital transformation, automation has progressed from simple script execution to complex systems with AI‑driven decision‑making. Gartner predicts that by 2025 more than 70% of enterprises will adopt some form of AI‑powered automation, boosting efficiency and granting unprecedented adaptability and creativity.
Traditional automation tools can handle repetitive tasks but struggle with dynamic web elements and complex user interactions. AI fills this gap by using machine‑learning algorithms to understand context, make intelligent decisions, and adjust execution strategies in real time.
Traditional Automation vs Intelligent Automation
Flowcharts
Traditional Automation
Intelligent Automation
Characteristics
Element Localization: Traditional uses precise selector matching; Intelligent combines visual features and semantic understanding.
Workflow Design: Traditional follows fixed workflows; Intelligent generates dynamic paths based on goals.
Exception Handling: Traditional relies on preset try‑catch blocks; Intelligent provides real‑time diagnosis and self‑recovery.
Test Data: Traditional uses static datasets; Intelligent generates dynamic data that conforms to business rules.
Maintenance Cost: Traditional scripts break with UI changes; Intelligent adapts automatically to many UI variations.
Execution Speed: Traditional is fast (millisecond response); Intelligent is slower due to AI inference time.
Accuracy of Positioning: Traditional is 100% precise but fragile; Intelligent is about 95% accurate but robust.
Applicable Scenarios: Traditional suits stable business processes; Intelligent excels in dynamic, complex scenarios.
Code Comparison
Traditional Automation
async function testLogin(page) {
await page.fill('#username', 'testuser');
await page.fill('#password', 'Pass123!');
await page.click('#login-btn');
await expect(page).toHaveURL(/dashboard/);
}Pain point: Element ID changes cause script failure
Intelligent Automation
async function smartLogin(page, ai) {
const context = {
pageHTML: await page.content(),
task: "完成登录操作",
constraints: "使用有效测试凭证"
};
const plan = await ai.generateActionPlan(context);
for (const action of plan.actions) {
if (action.type === 'fill') {
const element = await ai.locateElement({
page: page,
description: action.field
});
await element.fill(await ai.generateTestData(action.field));
}
// handle other action types ...
}
const result = await ai.verifyOutcome({
page: page,
expected: "成功登录"
});
}Advantage: Automatically adapts to login form structure changes
Technologies Used
What is Playwright?
Playwright is a cross‑browser, cross‑platform web automation and testing tool developed by Microsoft. It supports Chromium (Chrome/Edge), Firefox, and WebKit (Safari) and provides a unified API for end‑to‑end testing, UI automation, screenshot & PDF generation, dynamic page crawling, and performance monitoring.
End‑to‑end (E2E) testing
UI automation
Webpage screenshot & PDF generation
Crawling dynamically rendered pages
Performance monitoring
What is MidScene.js?
MidScene.js is an AI‑enhanced automation framework that adds large language model (LLM) capabilities to traditional tools like Playwright, enabling natural‑language task description, low‑code/no‑code development, multimodal interaction (text, image, structured data), and enterprise‑grade extensibility.
AI‑augmented automation: combines LLMs with scripts
Low‑code/no‑code friendly: supports natural‑language task definition
Multimodal interaction: handles text, images, structured data
Enterprise‑level extension: supports private deployment and domain‑specific fine‑tuning
Architecture
Web or Mobile Applications
Web Automation
Integration with Puppeteer
npm install @midscene/web puppeteer tsx --save-devDemo script
import puppeteer from "puppeteer";
import { PuppeteerAgent } from "@midscene/web/puppeteer";
const sleep = (ms: number) => new Promise(r => setTimeout(r, ms));
Promise.resolve(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setViewport({ width: 1280, height: 800, deviceScaleFactor: 1 });
await page.goto("https://www.ebay.com");
await sleep(5000);
const agent = new PuppeteerAgent(page);
await agent.aiAction('在搜索框输入 "Headphones" ,敲回车');
await sleep(5000);
const items = await agent.aiQuery('{itemTitle: string, price: Number}[], 找到列表里的商品标题和价格');
console.log("耳机商品信息", items);
await agent.aiAssert("界面左侧有类目筛选功能");
await browser.close();
})();Integration with Playwright
npm install @midscene/web playwright @playwright/test tsx --save-devDemo code
import { chromium } from 'playwright';
import { PlaywrightAgent } from '@midscene/web/playwright';
import 'dotenv/config';
const sleep = (ms) => new Promise(r => setTimeout(r, ms));
Promise.resolve(async () => {
const browser = await chromium.launch({ headless: true, args: ['--no-sandbox','--disable-setuid-sandbox'] });
const page = await browser.newPage();
await page.setViewportSize({ width: 1280, height: 768 });
await page.goto('https://www.ebay.com');
await sleep(5000);
const agent = new PlaywrightAgent(page);
await agent.aiAction('type "Headphones" in search box, hit Enter');
await agent.aiWaitFor('there is at least one headphone item on page');
const items = await agent.aiQuery('{itemTitle: string, price: Number}[], find item in list and corresponding price');
console.log('headphones in stock', items);
const isMoreThan1000 = await agent.aiBoolean('Is the price of the headphones more than 1000?');
console.log('isMoreThan1000', isMoreThan1000);
const price = await agent.aiNumber('What is the price of the first headphone?');
console.log('price', price);
const name = await agent.aiString('What is the name of the first headphone?');
console.log('name', name);
const location = await agent.aiLocate('What is the location of the first headphone?');
console.log('location', location);
await agent.aiAssert('There is a category filter on the left');
await agent.aiTap('the first item in the list');
await browser.close();
})();Chrome Bridge Mode
Using MidScene's Chrome extension bridge mode lets scripts control a desktop Chrome instance, reusing cookies, extensions, and page state.
Install dependencies npm install @midscene/web tsx --save-dev Demo script
import { AgentOverChromeBridge } from "@midscene/web/bridge-mode";
const sleep = (ms) => new Promise(r => setTimeout(r, ms));
Promise.resolve(async () => {
const agent = new AgentOverChromeBridge();
await agent.connectNewTabWithUrl("https://www.bing.com");
await agent.ai('type "AI 101" and hit Enter');
await sleep(3000);
await agent.aiAssert("there are some search results");
await agent.destroy();
})();Android Automation
Android automation can be performed by installing the MCP tool and operating the Android side.
Key Tools
Enable caching to significantly reduce AI service execution time.
MIDSCENE_CACHE=1 playwright test --config=playwright.config.tsSetting MIDSCENE_CACHE=1 enables MidScene.js caching, allowing reuse of previously cached resources (rendered results, static files) to accelerate test execution. playwright test runs Playwright test scripts, and --config=playwright.config.ts specifies the TypeScript configuration file.
API
agent.aiAction() or .ai() # UI operation steps
agent.aiTap() # click an element
agent.aiHover() # hover over an element
agent.aiInput() # input text into an element
agent.aiKeyboardPress() # press a keyboard key
agent.aiScroll() # scroll page or element
agent.aiRightClick() # right‑click an element
agent.aiAsk() # ask a question to the AI model
agent.aiQuery() # extract structured data from UI
agent.aiBoolean() # extract a boolean value
agent.aiNumber() # extract a numeric value
agent.aiString() # extract a string value
agent.aiAssert() # perform an assertionCase Study
Using the login page as an example, the following demo shows intelligent automation handling element changes and extracting data.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
