Artificial Intelligence 9 min read

Typical Business and Technical Architectures for Large Language Model Applications

This article reviews the common business and technical architectures used in large language model (LLM) applications, explains AI Embedded, AI Copilot, and AI Agent modes—including single‑ and multi‑agent systems—and offers guidance on selecting appropriate technology stacks such as prompt‑only, function‑calling agents, RAG, and fine‑tuning.

Rare Earth Juejin Tech Community

Apr 12, 2024

Typical Business and Technical Architectures for Large Language Model Applications

Introduction

We have previously covered the definition of large models and related concepts such as RAG and Agents; this article focuses on the typical business and technical architectures that emerge as LLMs are widely adopted, helping readers choose suitable technical routes for their own scenarios.

Infrastructure vs. Application Layer

Software development consists of building middleware and frameworks (infrastructure) and then creating applications on top of them. Similarly, LLM development splits into (1) building and training foundational models and (2) constructing applications based on those models.

Typical Business Architectures

Three dominant patterns are observed in practice:

AI Embedded Mode

Integrates LLM capabilities into a specific step of an existing application to improve efficiency.

AI Copilot Mode

Uses LLMs extensively throughout a system, providing information and suggestions (e.g., Microsoft Copilot, GitHub Copilot) without making final decisions.

AI Agent Mode

Enables users to issue high‑level commands while the AI autonomously decomposes and executes tasks.

Single‑Agent and Multi‑Agent

Single‑Agent systems rely on one LLM instance, whereas Multi‑Agent systems consist of multiple autonomous agents that communicate and collaborate to solve complex tasks, often using dialogue‑based interactions and reinforcement learning from human feedback (RLHF).

Common single‑agent implementations include AutoGPT, ChatGPT+ (code interpreter or plugins), LangChain ReAct Agent, and Transformers Agent.

Common multi‑agent frameworks include BabyAGI, CAMEL, Multi‑Agent Debate, and MetaGPT. The Autogen framework is highlighted as a dedicated Multi‑Agent development platform.

Technical Architectures

Pure Prompt

Simple conversational interaction: user asks, model answers.

Agent + Function Calling

Agent: AI proactively makes requests.

Function Calling: AI invokes external functions to fulfill those requests.

Example: When asked about travel plans, the agent first asks for the budget.

RAG (Retrieval‑Augmented Generation)

Embeddings: Convert text into vectors for similarity search.

Vector Database: Stores vectors for efficient retrieval.

Vector Search: Finds the most similar vectors to a query.

Example: Looking up relevant textbook content to answer a question.

Fine‑tuning

Adjusting a pre‑trained LLM on domain‑specific data to improve stability, reduce inference cost at scale, or increase generation speed.

Choosing a Technical Route

A non‑rigorous but common decision‑making flow is presented to help select the appropriate architecture based on project requirements.

When Fine‑tuning Is Worth Trying

Fine‑tuning is advisable when aiming to improve model stability, when serving a large user base to lower inference costs, or when needing faster generation.

Conclusion

The article analyzes typical business and technical architectures for LLM applications, enabling readers to understand current usage patterns and to evaluate how to design their own architectures and choose suitable technology stacks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture LLM RAG fine‑tuning AI Agent Multi-Agent

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.