Exploring LLM Application Architectures: From AI Embedded to Multi‑Agent Systems
This article examines the typical business and technical architectures for large language model applications, covering AI Embedded, Copilot, and Agent modes, single‑ and multi‑agent systems, core frameworks, and guidance on selecting appropriate technical routes.
Introduction
We have previously introduced the definition of large models and related concepts such as RAG and agents. This article focuses on the ecosystems and scenarios in which these concepts arise, describing the mainstream business and technical architectures for LLM‑based applications so readers can choose suitable technical routes for their own needs.
Infrastructure vs. Application Layer
Software development is split between middleware/framework infrastructure and the applications built on top. Large‑model development follows the same pattern: (1) building and training foundational models, and (2) constructing applications that leverage those models.
We are constantly forced to absorb overwhelming information and rapid technological change, or risk being left behind.
Typical Business Architectures
In practice, three main patterns dominate:
AI Embedded Mode
LLM capabilities are inserted into a specific step of an existing application to improve efficiency.
AI Copilot Mode
LLM functions are widely integrated throughout a system, acting as an information source or assistant (e.g., Microsoft Copilot, GitHub Copilot). The model does not make final decisions but enhances productivity.
AI Agent Mode
Users issue high‑level instructions; the AI autonomously decomposes tasks and executes them. This mode can involve single agents or multi‑agent collaborations.
Single‑Agent vs. Multi‑Agent
Traditional agents replace rule‑engine or knowledge‑base components with LLMs, providing reasoning and dialogue capabilities. Multi‑Agent systems consist of multiple autonomous agents that communicate and cooperate to solve complex tasks.
Common Single‑Agent Systems
AutoGPT – an open‑source AI agent that pursues a given goal using various tools (no multi‑agent collaboration).
ChatGPT+ (code interpreter or plugins) – a conversational AI agent enhanced with code execution or external plugins.
LangChain Agent – part of the LangChain framework; includes the ReAct agent and follows a single‑agent paradigm.
Transformers Agent – an experimental natural‑language API built on the Transformers repository, also single‑agent.
Common Multi‑Agent Systems
BabyAGI – a Python‑based task management system that uses multiple LLM agents for task creation, prioritization, and execution.
CAMEL – an agent communication framework that uses role‑playing to enable chat agents to cooperate, though it lacks tool usage.
Multi‑Agent Debate – constructs multiple LLM agents that debate to improve factuality and reasoning.
MetaGPT – a multi‑agent framework that assigns different GPT roles to collaboratively develop software solutions.
Multi‑Agent Development Framework
Autogen is a framework designed specifically for building LLM applications with both single‑ and multi‑agent capabilities.
Technical Architecture
Pure Prompt
Simple turn‑based interaction: user asks a question, model replies.
Agent + Function Calling
Agent – AI proactively makes requests.
Function Calling – AI invokes a specific function to fulfill a request.
Example – when asked about travel plans, the agent first asks for the budget.
RAG (Retrieval‑Augmented Generation)
Embeddings – convert text into vectors for similarity calculation.
Vector Database – stores vectors for efficient lookup.
Vector Search – retrieves the most similar vectors based on a query.
Example – like searching a textbook for relevant passages during an exam.
Fine‑Tuning
Adjusting a large model on specific data to improve performance for particular tasks.
Choosing a Technical Route
When faced with a requirement, a pragmatic (though not rigorous) decision‑making process can help select the appropriate solution.
When Fine‑Tuning Is Worth Trying
For newcomers, prompt engineering often suffices, but fine‑tuning becomes valuable when:
Improving model stability.
Large user base makes inference cost reduction important.
Increasing generation speed.
Conclusion
This article analyzed typical business and technical architectures for LLM applications, enabling readers to understand current usage patterns and to evaluate how to design their own architectures and choose suitable technical routes.
References
[1] https://juejin.cn/post/7348643498117955603
Architect's Alchemy Furnace
A comprehensive platform that combines Java development and architecture design, guaranteeing 100% original content. We explore the essence and philosophy of architecture and provide professional technical articles for aspiring architects.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
