Typical Business and Technical Architectures for Large Language Model Applications
This article reviews the common business and technical architectures used in large language model (LLM) applications, explains AI Embedded, AI Copilot, and AI Agent modes—including single‑ and multi‑agent systems—and offers guidance on selecting appropriate technology stacks such as prompt‑only, function‑calling agents, RAG, and fine‑tuning.
Introduction
We have previously covered the definition of large models and related concepts such as RAG and Agents; this article focuses on the typical business and technical architectures that emerge as LLMs are widely adopted, helping readers choose suitable technical routes for their own scenarios.
Infrastructure vs. Application Layer
Software development consists of building middleware and frameworks (infrastructure) and then creating applications on top of them. Similarly, LLM development splits into (1) building and training foundational models and (2) constructing applications based on those models.
Typical Business Architectures
Three dominant patterns are observed in practice:
AI Embedded Mode
Integrates LLM capabilities into a specific step of an existing application to improve efficiency.
AI Copilot Mode
Uses LLMs extensively throughout a system, providing information and suggestions (e.g., Microsoft Copilot, GitHub Copilot) without making final decisions.
AI Agent Mode
Enables users to issue high‑level commands while the AI autonomously decomposes and executes tasks.
Single‑Agent and Multi‑Agent
Single‑Agent systems rely on one LLM instance, whereas Multi‑Agent systems consist of multiple autonomous agents that communicate and collaborate to solve complex tasks, often using dialogue‑based interactions and reinforcement learning from human feedback (RLHF).
Common single‑agent implementations include AutoGPT, ChatGPT+ (code interpreter or plugins), LangChain ReAct Agent, and Transformers Agent.
Common multi‑agent frameworks include BabyAGI, CAMEL, Multi‑Agent Debate, and MetaGPT. The Autogen framework is highlighted as a dedicated Multi‑Agent development platform.
Technical Architectures
Pure Prompt
Simple conversational interaction: user asks, model answers.
Agent + Function Calling
Agent: AI proactively makes requests.
Function Calling: AI invokes external functions to fulfill those requests.
Example: When asked about travel plans, the agent first asks for the budget.
RAG (Retrieval‑Augmented Generation)
Embeddings: Convert text into vectors for similarity search.
Vector Database: Stores vectors for efficient retrieval.
Vector Search: Finds the most similar vectors to a query.
Example: Looking up relevant textbook content to answer a question.
Fine‑tuning
Adjusting a pre‑trained LLM on domain‑specific data to improve stability, reduce inference cost at scale, or increase generation speed.
Choosing a Technical Route
A non‑rigorous but common decision‑making flow is presented to help select the appropriate architecture based on project requirements.
When Fine‑tuning Is Worth Trying
Fine‑tuning is advisable when aiming to improve model stability, when serving a large user base to lower inference costs, or when needing faster generation.
Conclusion
The article analyzes typical business and technical architectures for LLM applications, enabling readers to understand current usage patterns and to evaluate how to design their own architectures and choose suitable technology stacks.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.