Artificial Intelligence 25 min read

Designing LLM‑Friendly Architecture: What Truly Makes an AI‑Friendly System?

The article analyzes how traditional deterministic engineering architectures clash with the probabilistic, semantic, and dynamic nature of LLM‑driven AI, proposing three paradigm shifts and detailing an AI‑Friendly stack—including Multi‑Agent, Context Engineering, and observability—that achieved 95.7% audit accuracy and over 80% efficiency gains in real‑world marketing scenarios.

DaTaobao Tech

Jun 1, 2026

Designing LLM‑Friendly Architecture: What Truly Makes an AI‑Friendly System?

Introduction

2025 is positioned as the year of Agentic AI. The rapid emergence of large‑model products (GPT‑4o, Gemini 1.5, etc.) has pushed enterprises to rethink engineering foundations because traditional architectures are built on deterministic assumptions that conflict with AI’s inherent uncertainty.

Conflict Between AI and Traditional Engineering

Traditional systems expect fixed input schemas, rule‑based workflows, and static execution paths. AI‑driven services produce probabilistic, emergent outputs, require semantic understanding of intent, and need dynamic planning. When an AI component outputs data that does not conform to a predefined schema, or when a low‑latency, high‑throughput pipeline must accommodate a high‑latency, low‑throughput agent, the mismatch leads to errors, timeouts, and degraded availability.

Traditional Architecture Types

Two dominant patterns are identified:

Platform‑centric architecture (e.g., Alibaba’s “big‑middle‑small‑front” model) emphasizes domain‑model standardisation and reusable business processes.

Business‑centric architecture favours lightweight MVC stacks for rapid feature delivery.

Three Paradigm Shifts for AI‑Friendly Design

The evolution from traditional to AI‑Friendly architecture is expressed through three orthogonal shifts:

Deterministic → Probabilistic : Move from strict y = f(x) mappings to probabilistic outputs that are constrained into a “safe interval” via RAG, prompt engineering, and evaluation loops.

Structured → Semantic : Replace rigid schema validation with intent‑driven processing, allowing natural‑language and unstructured inputs to be understood.

Static → Dynamic : Transition from hard‑coded workflows to planning‑based execution where the system autonomously decides actions based on current context.

AI‑Friendly Architecture Stack

The stack consists of four layers:

Foundation Layer : Model management, knowledge (vector) stores, and tool registries provided by platforms such as ideaLab and ZETTA . Model APIs follow the OpenAI protocol, and knowledge bases aggregate data from DingTalk, Yuque, etc.

Capability Layer : Implements Multi‑Agent coordination, Context Engineering, RAG, and AI‑Friendly APIs. Spring AI Alibaba is used as the Java framework for integrating these capabilities.

Agent / Intent / Session Layers :

Agent Layer : Three concrete agents – BaseAgent (simple chatbot), ReActAgent (reason‑act loop), and PlanAgent (global planning).

Intent Layer : Detects and disambiguates user intent, handling parallel, sequential, and dependent intents; rewrites and expands queries for downstream agents.

Session Layer : Provides long‑term and short‑term memory, effectively the “context engineering” component that stores and retrieves relevant facts.

Observability & Evaluation Layer : Metrics such as agent execution path, TTFT, token consumption, TPM/QPM, and error rates are collected via internal tools (EagleEye, Sunfire) and fed back into the evaluation pipeline.

Multi‑Agent Implementation

Agents communicate via a lightweight protocol and are orchestrated in three modes: centralized decision, decentralized negotiation, and hybrid MOE (Mixture‑of‑Experts). In the marketing‑automation scenario, a central agent performs intent classification and dispatches domain‑specific agents (product, order, inventory, subsidy, material) that each run a ReAct + Plan loop.

ReAct and Plan Paradigms

The ReAct loop is expressed as: Thought → Action → Observation It enables a single model to iteratively reason, invoke tools, and observe results. Because single‑step reasoning excels at rational tasks, it is combined with a Plan stage that generates a global plan template, improving performance on subjective or multi‑step problems.

Context Engineering

Beyond simple RAG retrieval, context engineering selects, organises, and compresses knowledge so that the LLM receives the most relevant information within its context window. Two concrete mechanisms are used in the audit scenario:

Historical audit case library stored in a vector DB, providing ~8% accuracy lift.

Mixed‑model voting with confidence scoring, delivering >10% accuracy improvement.

Tooling: From REST‑ful to LLM‑ful

APIs are refactored to be AI‑Friendly:

Atomic tool decomposition to match ReAct’s stepwise execution.

Human‑readable parameter names and flattened key‑value payloads.

Explicit error categories with concise messages for in‑model decision making.

Evaluation & Observability

The end‑to‑end evaluation pipeline follows:

Online data sampling → Sample set construction → Automated & manual evaluation → Engineering / model optimisation → Online A/B → Metric observation

Observability now tracks not only service latency and error rates but also LLM‑specific signals such as token usage, agent decision paths, and plan quality.

Practical Business Cases

AI Audit : Deployed in a high‑frequency flash‑sale (秒杀) workflow handling 20‑30 k items daily. Achieved 95.7% accuracy, 99.1% recall, and >80% reduction in manual review time.

AI Q&A (CogentAI) : An assistant that performs intent detection, planning, tool selection, and dynamic plan adjustment. Delivered >98% answer correctness and >80% efficiency gain over traditional QA bots.

Conclusion

The AI‑Friendly architecture does not discard ten years of engineering wisdom; it augments it with probabilistic reasoning, semantic intent handling, and dynamic planning. The three‑paradigm shift provides a clear migration path, and the presented stack has already demonstrated measurable business impact.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM ReAct Observability RAG Multi-Agent AI Architecture Context Engineering

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.