How Claude 3.7 Dissects Baidu’s AI Agents for Software Testing: A New Era of Intelligent Testing
The article systematically analyzes Baidu senior test engineer Wang Zhe’s presentation on applying AI Agents to software testing, outlining the importance, challenges, architecture, knowledge management, and concrete agent workflows that promise up to ten‑fold efficiency gains.
Importance and challenges of intelligent testing
Cost‑reduction and efficiency are critical as internet companies move toward refined, cost‑effective development.
Increasing software complexity and new development models (low‑code, zero‑code, AI‑generated code) make traditional testing a bottleneck.
Rapid advances in generative AI (e.g., ChatGPT) create large opportunities for testing transformation.
QA work differs from development: tasks are individually simple but cover a broader scope, limiting the impact of simple AI assistance.
Copilot‑style AI tools have low acceptance in testing because test tasks are low‑complexity and benefit little from pure code generation.
Full AI replacement of QA is unrealistic now; effective human‑AI collaboration is required.
Core problems to solve
Business understanding – enabling an AI Agent to grasp project context for accurate output.
End‑to‑end completion – allowing the Agent to finish a task without frequent hand‑offs, truly boosting efficiency.
Overall architecture of intelligent testing
Workflow – a Retrieval‑Augmented Generation (RAG)‑based end‑to‑end testing pipeline.
System – knowledge base, retrieval service, multiple agents, and tool collections.
Knowledge construction
Global knowledge: system concepts, testing standards, etc.
Requirement‑level knowledge: PRDs, UI/UE designs, API docs.
Practical applications of AI Agents in testing
Knowledge construction & management
Global knowledge
Document knowledge – OCR + multimodal models, enhanced by GraphRAG.
API knowledge – unified ingestion, stored both structurally and as vectors.
Code knowledge – AST‑based retrieval and embedding‑based semantic search.
Requirement‑level knowledge
Knowledge linking for rule‑based retrieval.
Image preprocessing and PRD structural splitting.
Knowledge retrieval & usage
Unified retrieval service serving all test‑scenario agents.
Plug‑and‑play ChatBot for project‑specific Q&A.
Test‑case design agent
Multi‑step retrieval can cause model hallucination.
Image information is hard to exploit.
Single‑step reasoning makes case granularity hard to control.
Ensuring compliance with business standards is difficult.
Five‑step workflow:
Intelligent requirement decomposition – converting unstructured PRDs into structured mind maps.
Smart test‑point extraction – pulling independent test points from the mind map.
On‑demand retrieval – the agent asks clarifying questions to enhance retrieval.
Test‑point case generation – LLM creates test cases based on each point.
Specification‑driven optimization – rewriting cases to meet business‑line standards.
API test agent
Case generation – recognizing scenario and business, modular design, lightweight configuration.
Case merging & execution – defining merge agents, running tests, producing reports.
Problem localization & fixing – handling syntax/logic errors, server stability issues, API changes, hidden logic checks.
Web UI test agent
Large workload; most changes need front‑end verification.
High cost of automating script creation and maintenance.
Linear growth of maintenance cost for traditional recording solutions.
Six‑stage development path:
Feasibility – HTML compression + LLM‑based element locating (≈75% accuracy).
First version – HTML compression, element locating, browser execution, test‑code generation.
Architecture optimization – adding brain and location_element modules, passing browser screenshots.
Engineering optimization – second‑round HTML compression, XPATH auto‑optimization.
Human‑machine interaction – incorporating historical experience and manual support.
Auto‑repair – test‑code conversion, fast replay, code‑diff analysis.
LLM‑driven intelligent element locating.
Browser state + screenshot‑based decision mechanism.
Automatic XPATH optimization.
Human‑AI collaboration for difficult scenarios.
Summary and outlook
AI Agents can disrupt traditional testing workflows, delivering up to ten‑fold efficiency improvements.
Success hinges on agents’ business knowledge and end‑to‑end task completion.
Applicable across case design, API testing, and Web UI testing.
Adoption should start with small pilots and scale gradually.
Fostering an AI‑native mindset within teams is essential for practical rollout.
Future directions include strengthening multimodal data understanding, integrating code, documents, and design diagrams into a unified knowledge base, and building modular, customizable AI‑Agent ecosystems that align tightly with business processes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Software Engineering 3.0 Era
With large models (LLMs) reshaping countless industries, software engineering is leading the charge into the Software Engineering 3.0 era—model-driven development and operations. This account focuses on the new paradigms, theories, and methods of SE 3.0, and showcases its tools and practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
