How Claude 3.7 Dissects Baidu’s AI Agents for Software Testing: A New Era of Intelligent Testing

The article systematically analyzes Baidu senior test engineer Wang Zhe’s presentation on applying AI Agents to software testing, outlining the importance, challenges, architecture, knowledge management, and concrete agent workflows that promise up to ten‑fold efficiency gains.

Software Engineering 3.0 Era
Software Engineering 3.0 Era
Software Engineering 3.0 Era
How Claude 3.7 Dissects Baidu’s AI Agents for Software Testing: A New Era of Intelligent Testing

Importance and challenges of intelligent testing

Cost‑reduction and efficiency are critical as internet companies move toward refined, cost‑effective development.

Increasing software complexity and new development models (low‑code, zero‑code, AI‑generated code) make traditional testing a bottleneck.

Rapid advances in generative AI (e.g., ChatGPT) create large opportunities for testing transformation.

QA work differs from development: tasks are individually simple but cover a broader scope, limiting the impact of simple AI assistance.

Copilot‑style AI tools have low acceptance in testing because test tasks are low‑complexity and benefit little from pure code generation.

Full AI replacement of QA is unrealistic now; effective human‑AI collaboration is required.

Core problems to solve

Business understanding – enabling an AI Agent to grasp project context for accurate output.

End‑to‑end completion – allowing the Agent to finish a task without frequent hand‑offs, truly boosting efficiency.

Overall architecture of intelligent testing

Workflow – a Retrieval‑Augmented Generation (RAG)‑based end‑to‑end testing pipeline.

System – knowledge base, retrieval service, multiple agents, and tool collections.

Knowledge construction

Global knowledge: system concepts, testing standards, etc.

Requirement‑level knowledge: PRDs, UI/UE designs, API docs.

Practical applications of AI Agents in testing

Knowledge construction & management

Global knowledge

Document knowledge – OCR + multimodal models, enhanced by GraphRAG.

API knowledge – unified ingestion, stored both structurally and as vectors.

Code knowledge – AST‑based retrieval and embedding‑based semantic search.

Requirement‑level knowledge

Knowledge linking for rule‑based retrieval.

Image preprocessing and PRD structural splitting.

Knowledge retrieval & usage

Unified retrieval service serving all test‑scenario agents.

Plug‑and‑play ChatBot for project‑specific Q&A.

Test‑case design agent

Multi‑step retrieval can cause model hallucination.

Image information is hard to exploit.

Single‑step reasoning makes case granularity hard to control.

Ensuring compliance with business standards is difficult.

Five‑step workflow:

Intelligent requirement decomposition – converting unstructured PRDs into structured mind maps.

Smart test‑point extraction – pulling independent test points from the mind map.

On‑demand retrieval – the agent asks clarifying questions to enhance retrieval.

Test‑point case generation – LLM creates test cases based on each point.

Specification‑driven optimization – rewriting cases to meet business‑line standards.

API test agent

Case generation – recognizing scenario and business, modular design, lightweight configuration.

Case merging & execution – defining merge agents, running tests, producing reports.

Problem localization & fixing – handling syntax/logic errors, server stability issues, API changes, hidden logic checks.

Web UI test agent

Large workload; most changes need front‑end verification.

High cost of automating script creation and maintenance.

Linear growth of maintenance cost for traditional recording solutions.

Six‑stage development path:

Feasibility – HTML compression + LLM‑based element locating (≈75% accuracy).

First version – HTML compression, element locating, browser execution, test‑code generation.

Architecture optimization – adding brain and location_element modules, passing browser screenshots.

Engineering optimization – second‑round HTML compression, XPATH auto‑optimization.

Human‑machine interaction – incorporating historical experience and manual support.

Auto‑repair – test‑code conversion, fast replay, code‑diff analysis.

LLM‑driven intelligent element locating.

Browser state + screenshot‑based decision mechanism.

Automatic XPATH optimization.

Human‑AI collaboration for difficult scenarios.

Summary and outlook

AI Agents can disrupt traditional testing workflows, delivering up to ten‑fold efficiency improvements.

Success hinges on agents’ business knowledge and end‑to‑end task completion.

Applicable across case design, API testing, and Web UI testing.

Adoption should start with small pilots and scale gradually.

Fostering an AI‑native mindset within teams is essential for practical rollout.

Future directions include strengthening multimodal data understanding, integrating code, documents, and design diagrams into a unified knowledge base, and building modular, customizable AI‑Agent ecosystems that align tightly with business processes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RAGSoftware Testingtest automationKnowledge BaseIntelligent TestingClaude 3.7
Software Engineering 3.0 Era
Written by

Software Engineering 3.0 Era

With large models (LLMs) reshaping countless industries, software engineering is leading the charge into the Software Engineering 3.0 era—model-driven development and operations. This account focuses on the new paradigms, theories, and methods of SE 3.0, and showcases its tools and practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.