Unlocking Precise AI Data Generation with Multi‑Agent Architecture
This article explains how a multi‑agent system—comprising intent‑recognition, tool‑engine, and inference agents—solves the challenges of AI‑driven data generation (AI‑造数) by improving accuracy, speed, and scalability through modular design, prompt engineering, and sophisticated tool governance.
Introduction
In the context of joint‑testing data generation (referred to as "AI‑造数"), we initially used a single‑agent approach. As more tools and scenarios were added, we evolved to a multi‑agent architecture that separates intent recognition, tool engine, and inference execution.
Challenges
Challenge: Accurately extract commands from rich queries, filter the right tool from thousands, and assemble toolchains for complex instructions.
Fundamental difficulty: Semantic and functional gaps between users and tool authors.
Single‑Agent Solution
All responsibilities are handed to a reasoning agent, with careful tool governance and prompt engineering.
Multi‑Agent Solution
We split the system into multiple agents, each focusing on a specific goal, and weaken agents where engineering can replace them, improving response time.
AI‑造数 1.0 – Single‑Agent Mode
Data generation means feeding the user’s query and the system’s tool set to an LLM, letting it decide and invoke tools. After the MCP standard became universal, AI‑造数 became a natural extension.
In this mode, the agent handles memory management, prompt engineering, and LLM interaction.
Architecture Diagram
Tool Governance
Tool descriptions must include basic info, functionality, output, and troubleshooting details. Proper description quality directly impacts model decision quality.
We categorize tools into public (stable, strict) and private (flexible) domains to manage the growing tool pool.
Prompt Engineering
Prompts define the LLM’s workspace as "find and execute tools", fill necessary context, set principles, and provide examples. Key lessons:
Do not enforce output style on reasoning agents; it harms performance.
When principles have little effect, add examples.
Examples may introduce hidden attributes; be aware of implicit fields.
Abstracting models, capabilities, and processes improves accuracy (expanded in version 2.0).
Intent Recognition
We abstract eight intent types (data creation, data operation, data query, data validation, tool inquiry, tool operation, project operation, other) and define an IntentResult model to standardize downstream processing.
Typical intents in joint‑testing include data creation, data operation, data query, and data validation, each requiring different query details and tool filtering.
Examples illustrate how the system parses user queries into structured intent models.
Tool Engine
The engine filters thousands of tools in memory to a handful of relevant candidates for the LLM. It consists of a real‑time filtering module and a backend tool‑parsing agent.
Tools are abstracted into a ToolEssentialModel with fields such as function type, environment, domain, dependent entities, and output entities.
To bridge semantic gaps, we combine text similarity (with synonym tables) and embedding similarity. To address functional gaps, we use primary and secondary tool tracks.
Inference Execution
The inference agent receives high‑quality requests and guides the LLM through a reverse‑reasoning and forward‑execution process:
Identify the final tool that satisfies the goal.
Recursively find prerequisite tools for missing inputs.
Construct a tool chain (tool n → … → tool a) and execute it forward.
We use Qwen‑max for deep reasoning.
Overall Effect
By filtering tools from hundreds to about five candidates, we reduce the LLM’s decision space by an order of magnitude, improving both accuracy and latency as the tool pool grows.
Solution Recommendations
Single‑agent is simple, fast to implement, and works well when the tool set is focused and users and tool authors share language.
Multi‑agent suits open platforms with many tools and users but adds complexity and debugging difficulty.
Final Thoughts
Building AI‑plus‑product systems faces uncertainty; clear principles, standards, and processes are essential. When results are unsatisfactory, tightening user query specifications often yields significant improvements.
Key Takeaways
Creating AI‑driven products requires deep abstraction to handle uncertainty. Ambiguous areas will cause LLMs to waver, so explicit rules and guidance are crucial.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
