Artificial Intelligence 20 min read

Beyond Vector Storage: Inside Milvus 2.6’s Three‑Layer AI Agent Architecture

Milvus 2.6 transforms from a pure vector‑storage backend into a full‑stack AI‑Agent infrastructure by introducing a three‑layer capability system—coding‑rule, protocol, and runtime—covering memory, retrieval, and tool backends, hybrid search, strict operation ordering, and multiple integration paths, while contrasting traditional RAG with agent‑driven modes.

Shuge Unlimited

Jun 14, 2026

Beyond Vector Storage: Inside Milvus 2.6’s Three‑Layer AI Agent Architecture

1. From Vector Database to Agent Infrastructure

The core question is what role Milvus plays in an AI‑Agent architecture. Official documentation (milvus_for_agents.md) defines Milvus as an agent‑friendly interface rather than a simple storage backend, providing three responsibilities: Memory Backend, Retrieval Backend, and Tool Backend.

Memory Backend : offers long‑term memory for agents via the Mem0 framework.

Retrieval Backend : supplies knowledge retrieval through RAG pipelines.

Tool Backend : exposes functions that agents can call.

Milvus therefore aims not only at low latency and high recall but also at enabling autonomous agent calls, persistent memory, and flexible retrieval.

2. Three‑Layer Capability Architecture

Milvus structures its agent capabilities into three clear layers, as shown in the architecture diagram.

Milvus for AI Agents Architecture Overview

Coding‑Rule Layer: 11 Critical Constraints

Milvus provides a structured prompt system (AGENTS.md) for coding assistants such as Cursor, Claude Code, and GitHub Copilot. The document lists eleven CRITICAL rules, for example:

Must use MilvusClient; ORM APIs are prohibited.

Use DataType enum; string types are forbidden.

Schema constraints: nullable fields can be added in v2.6+, but existing fields cannot be modified or deleted.

Index creation must precede loading; loading must precede searching.

Each AnnSearchRequest accepts only a single vector.

These rules are embedded into prompt files for different environments (Cursor, GitHub Copilot, Claude Code, JetBrains IDEs, Gemini CLI), ensuring that generated code respects the constraints.

Protocol Layer: MCP Standardized Access

The protocol layer introduces three components related to the Model Context Protocol (MCP):

Milvus Skill (GitHub: zilliztech/milvus-skill) – an Agent Skill for Claude Code that teaches LLMs how to use PyMilvus.

MCP Server (GitHub: zilliztech/mcp-server-milvus) – a server that lets any MCP‑compatible agent interact with Milvus without learning the SDK.

Claude Context MCP (GitHub: zilliztech/claude-context) – a specialized MCP server for Claude Code.

This standard protocol lets agents query, insert, and manage vectors via a uniform interface, reducing integration effort.

Runtime‑Framework Layer: Integration with Agent Frameworks

Milvus offers official integration guides for several popular agent frameworks:

OpenAI Agents SDK – wrapped as a function tool.

Mem0 – uses Milvus as the vector store for persistent memory.

LangChain – integrates Milvus into RAG pipelines.

LlamaIndex – supports hybrid search and asynchronous integration.

Llama Stack – provides metadata filtering and agent coordination.

Regardless of the framework, Milvus supplies a consistent access path.

3. Complete Agent Retrieval Chain

After understanding the layers, the data flow in an agent loop becomes clear.

Ingestion Pipeline

文档 → 分块 → 向量化 → 插入 Milvus

Retrieval Pipeline

用户查询 → 查询向量化 → 搜索 Milvus（含过滤和重排）→ LLM 生成

In an agent scenario, the retrieval pipeline is not fixed; the agent decides whether to retrieve, what to retrieve, and which parameters to use, distinguishing agent‑driven mode from traditional RAG.

Operation Order Constraints

Milvus enforces a strict seven‑step order:

Vectorize documents.

Create collection (with schema).

Insert data.

Create index.

Load collection.

Execute search.

Pass results to LLM.

Two key constraints are highlighted: the index must be created before loading, and loading must happen before searching. In v2.6.x, create_collection() can accept both schema and index_params to automate index creation and loading.

Hybrid Search: Dense + BM25

Milvus recommends a hybrid RAG approach that combines dense vector similarity with BM25 keyword search. The following code demonstrates a hybrid request:

# Dense vector request (semantic search)

dense_req = AnnSearchRequest(
    data=[query_embedding],
    anns_field="dense_vector",
    param={"metric_type": "COSINE"},
    limit=top_k * 2,
)

# BM25 sparse vector request (keyword search)

sparse_req = AnnSearchRequest(
    data=["machine learning applications"],  # raw text query
    anns_field="sparse_vector",
    param={"metric_type": "BM25"},
    limit=top_k * 2,
)

# Merge results with RRF ranker
results = client.hybrid_search(
    collection_name=COLLECTION_NAME,
    reqs=[dense_req, sparse_req],
    ranker=RRFRanker(),
    limit=top_k,
    output_fields=["text", "source"],
)

BM25 was introduced in Milvus 2.5+ and requires the BM25 function and analyzer to be defined at collection creation.

Search Pattern Decision Table

Single‑vector similarity → client.search() Dense + sparse hybrid → client.hybrid_search() Metadata filtering → add filter parameter to client.search() Full‑text keyword match → requires predefined BM25 function.

Exact phrase match → use TEXT_MATCH expression.

Primary‑key lookup → client.get() Note: the search iterator only supports basic ANN search, not hybrid search.

4. Agent Memory System Implementation

Long‑term memory for agents is realized through the Mem0 framework, which uses Milvus as the vector store.

Mem0 provides six core operations: add() – store unstructured text with user ID and metadata. search() – semantic search for relevant memories. update() – modify existing memories. get_all() – retrieve all memories for a user. delete() – remove memories. history – track changes over time.

Configuration example (few lines):

config = {
    "vector_store": {
        "provider": "milvus",
        "config": {
            "collection_name": "quickstart_mem0_with_milvus",
            "embedding_model_dims": "1536",
            "url": "./milvus.db",  # local or cloud
        },
    },
}

m = Memory.from_config(config)

In the memory stack, Milvus fulfills three responsibilities:

Persistence layer : stores user preferences, dialogue history, and knowledge across sessions.

Retrieval layer : uses vector similarity to fetch context‑relevant memories on demand.

Update layer : upserts keep memory fresh; outdated entries are overwritten.

Milvus itself does not implement memory‑management policies such as decay or prioritization; those belong to Mem0.

5. Tool Encapsulation and Call Chain

To let an agent invoke Milvus autonomously, the search capability is wrapped as a function tool. The documentation (openai_agents_milvus.md) provides the full integration.

@function_tool
async def search_milvus_text(ctx, collection_name, query_text, limit):
    """Search for text documents in a Milvus collection using full‑text search."""
    client = MilvusClient()
    search_params = {
        "metric_type": "BM25",
        "params": {"drop_ratio_search": 0.2}
    }
    results = client.search(
        collection_name=collection_name,
        data=[query_text],
        anns_field="sparse",
        limit=limit,
        search_params=search_params,
        output_fields=["text"],
    )
    return json.dumps({"results": results, "query": query_text})

The end‑to‑end call chain becomes:

用户提问 → Agent 推理 → Function Call（Milvus 搜索）
    → 向量/BM25 检索 → 结构化结果返回 → Agent → LLM 生成 → 响应

The agent decides whether to trigger the search, which differentiates agent‑driven mode from a fixed RAG pipeline.

Three integration paths are compared:

Direct SDK : agent code calls PyMilvus directly – most flexible but more code.

Function Calling : wrap Milvus search as a tool (e.g., OpenAI Agents SDK) – low‑code integration.

MCP Server : use the standardized Model Context Protocol – minimal integration effort but limited to defined interfaces.

6. Traditional RAG vs. Agent‑Driven Mode

The article contrasts the two approaches across five dimensions:

Retrieval trigger : fixed pipeline vs. agent reasoning.

Retrieval type : pure vector search vs. dense + BM25 hybrid.

Database role : knowledge base vs. combined memory + knowledge + tool.

Integration method : direct code calls vs. function calling / MCP.

Protocol layer : none vs. MCP server.

The trend shows Milvus evolving from an "application‑call database" to an "agent‑call infrastructure".

7. System‑Design Perspective and Boundaries

Four design dimensions highlight Milvus’s agent‑friendliness:

Unified interface : MilvusClient consolidates all operations, deprecating older ORM APIs.

Constraint encoding : 11 CRITICAL rules embedded in prompts prevent common coding errors.

Multi‑protocol support : SDK, REST, and MCP cover direct coding to standardized access.

Elastic deployment : from Milvus Lite (zero‑config) to Zilliz Cloud (serverless) and Distributed (K8s) for production.

For agent workloads, the recommendation is to use Zilliz Cloud in production to eliminate operational overhead.

Uncovered boundaries (not implemented in v2.6.x): Milvus does not orchestrate agents, does not provide multi‑agent collaboration protocols, relies on external observability tools for workflow visualization, and does not host embedding or reranking models.

Conclusion

Milvus 2.6 is not merely a vector store; it serves as the memory backend, retrieval backend, and tool backend for AI agents. Its three‑layer architecture—coding rules, protocol, and runtime—shifts the database role from a passive knowledge store to an active component of agent infrastructure, embodying the principle that "when agents become the new software interaction paradigm, databases must evolve too."

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI agents MCP vector database Milvus Hybrid search Memory Backend

Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.