Artificial Intelligence 19 min read

How Mem0 Gives AI Agents Persistent Memory Using Milvus Vector Store

This article explains how Mem0 provides a durable memory layer for AI agents, enabling them to recall past interactions, store user preferences, and continuously improve by integrating with Milvus vector databases and graph stores, complete with step‑by‑step code examples and performance comparisons.

Alibaba Cloud Big Data AI Platform

Sep 10, 2025

How Mem0 Gives AI Agents Persistent Memory Using Milvus Vector Store

Mem0 is a memory layer designed for AI agents that acts like a persistent brain, allowing the agents to retrieve historical dialogues, remember user preferences, and continuously refine their behavior.

Key Features

Retrieve historical conversation context on demand.

Accurately store personal user preferences and important facts.

Learn from practice and self‑optimize over time.

Role of the Memory Layer

Without a memory layer, even large‑context LLMs lose context after a new session; with Mem0, context is retained, relevant content is recalled, and the system keeps improving.

Mem0 works alongside retrievers (RAG), LLMs, and context, recording past interactions, preserving long‑term knowledge, and evolving the agent’s behavior over time. Only relevant memories are merged into the prompt sent to the LLM.

Memory Layer vs. LLM Context Window

Capability

LLM Context Window

Mem0 Memory Store

Memory

Temporary

Persistent

Token Consumption

Increases with input

Optimized (only needed content)

Content Retrieval

Depends on LLM’s long‑context ability

Compressed, intent‑focused retrieval

Personalization

None

Historical session records

Memory Layer vs. RAG

Entity linking across sessions instead of static document retrieval.

Memory strategy prioritizes recent, highly relevant memories with automatic decay of older data.

Session continuity for virtual companions and learning assistants.

Continuous learning through real‑time feedback.

Dynamic updates without re‑indexing documents.

Mem0 Core Workflow

Semantic capture: LLM parses the conversation and extracts long‑term valuable semantics.

Content vectorization: Embedding model encodes semantics into high‑dimensional vectors.

Vector storage: Vectors are stored in a vector database (e.g., Alibaba Cloud Milvus).

Retrieval: New user input triggers semantic similarity search to fetch relevant memories.

Context enhancement: Retrieved memories are injected into the current reasoning chain, producing more coherent and personalized responses.

Alibaba Cloud Milvus Basics

Milvus is a distributed database built for vector similarity search. Its core technologies include:

Approximate nearest neighbor (ANN) search using HNSW, IVF, PQ, etc.

Separate vector indexing and querying with dynamic index types (FLAT, IVF_FLAT, IVF_PQ, HNSW).

Vector sharding and distributed computation for high throughput and low latency.

Milvus follows a cloud‑native, compute‑storage separated micro‑service architecture with four layers: access, coordination, execution, and storage, each independently scalable.

Use Cases

Image/video search (e‑commerce, security).

Semantic text search for intelligent客服, knowledge bases, code search.

Personalized recommendation systems.

Scientific research and security (drug screening, anomaly detection).

Autonomous driving data preparation and mining.

Practice 1: Building a Long‑Term Memory AI Agent

Prerequisites

Alibaba Cloud Milvus instance created.

DashScope service enabled and API‑KEY obtained.

Development Steps

Install dependencies: pip install langgraph langchain-openai mem0ai Configure environment variables for Qwen model, set up LLM, embedder, and Milvus vector store.

Define LangGraph state and chatbot logic that retrieves memories via mem0.search, constructs a system prompt with retrieved context, invokes the LLM, and stores the interaction back to Mem0.

Compile the graph, set up streaming output, and run an interactive loop.

from typing import Annotated, TypedDict, List
from langgraph.graph import StateGraph, START
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
from mem0 import Memory
import os

os.environ["OPENAI_API_KEY"] = "sk-xx"
os.environ["OPENAI_BASE_URL"] = "https://dashscope.aliyuncs.com/compatible-mode/v1"

llm = ChatOpenAI(model="qwen-plus", temperature=0.2, max_tokens=2000)

config = {
    "llm": {"provider": "openai", "config": {"model": "qwen-plus", "temperature": 0.2, "max_tokens": 2000}},
    "embedder": {"provider": "openai", "config": {"model": "text-embedding-v3", "embedding_dims": 128}},
    "vector_store": {"provider": "milvus", "config": {"collection_name": "mem0_test1", "embedding_model_dims": "128", "url": "http://c-xxx.milvus.aliyuncs.com:19530", "token": "root:xxx", "db_name": "default"}},
    "version": "v1.1"
}
mem0 = Memory.from_config(config)

class State(TypedDict):
    messages: Annotated[List[HumanMessage | AIMessage], add_messages]
    mem0_user_id: str

graph = StateGraph(State)

def chatbot(state: State):
    messages = state["messages"]
    user_id = state["mem0_user_id"]
    try:
        memories = mem0.search(messages[-1].content, user_id=user_id)
        memory_list = memories['results']
        context = "Relevant information from previous conversations:
"
        for memory in memory_list:
            context += f"- {memory['memory']}
"
        system_message = SystemMessage(content=f"You are a helpful assistant. Use the following context:
{context}")
        full_messages = [system_message] + messages
        response = llm.invoke(full_messages)
        interaction = [{"role": "user", "content": messages[-1].content}, {"role": "assistant", "content": response.content}]
        mem0.add(interaction, user_id=user_id)
        return {"messages": [response]}
    except Exception as e:
        response = llm.invoke(messages)
        return {"messages": [response]}

graph.add_node("chatbot", chatbot)
graph.add_edge(START, "chatbot")
graph.add_edge("chatbot", "chatbot")
compiled_graph = graph.compile()

def run_conversation(user_input: str, mem0_user_id: str):
    config = {"configurable": {"thread_id": mem0_user_id}}
    state = {"messages": [HumanMessage(content=user_input)], "mem0_user_id": mem0_user_id}
    for event in compiled_graph.stream(state, config):
        for value in event.values():
            if value.get("messages"):
                print("Assistant:", value["messages"][-1].content)
                return

if __name__ == "__main__":
    print("Welcome! How can I assist you?")
    mem0_user_id = "alice"
    while True:
        user_input = input("You: ")
        if user_input.lower() in ["quit", "exit", "bye"]:
            print("Goodbye!")
            break
        run_conversation(user_input, mem0_user_id)

Practice 2: Graph + Vector Engine for Complex Relations

Mem0 supports graph memory, allowing users to store and query complex relationships alongside vector embeddings.

Adding records involves extracting content with an LLM, embedding it into Milvus, and simultaneously extracting entities and relations into a graph database (e.g., Kuzu).

Retrieval combines vector similarity search and graph traversal, merging results for richer answers.

Installation and core code:

pip install kuzu rank-bm25 mem0ai

from langchain_openai import ChatOpenAI
from mem0 import Memory
import os, json

os.environ["OPENAI_API_KEY"] = "sk-xx"
os.environ["OPENAI_BASE_URL"] = "https://dashscope.aliyuncs.com/compatible-mode/v1"

llm = ChatOpenAI(model="qwen-plus", temperature=0.2, max_tokens=2000)

config = {
    "llm": {"provider": "openai", "config": {"model": "qwen-plus", "temperature": 0.2, "max_tokens": 2000}},
    "embedder": {"provider": "openai", "config": {"model": "text-embedding-v3", "embedding_dims": 128}},
    "vector_store": {"provider": "milvus", "config": {"collection_name": "mem0_test3", "embedding_model_dims": "128", "url": "http://c-xxx.milvus.aliyuncs.com:19530", "token": "root:xxx", "db_name": "default"}},
    "graph_store": {"provider": "kuzu", "config": {"db": "./mem0-example.kuzu"}},
    "version": "v1.1"
}

m = Memory.from_config(config)

# Add sample facts
m.add("I enjoy hiking", user_id="alice123")
m.add("I like badminton", user_id="alice123")
m.add("I dislike badminton", user_id="alice123")
m.add("My friend John has a dog named Tommy", user_id="alice123")
m.add("My name is Alice", user_id="alice123")
m.add("John and Harry both enjoy hiking", user_id="alice123")
m.add("My friend Peter is Spider‑Man", user_id="alice123")

def get_res(res):
    sorted_results = sorted(res['results'], key=lambda x: x['score'], reverse=True)
    res['results'] = sorted_results
    print(json.dumps(res, ensure_ascii=False, indent=2))

# Query examples
get_res(m.search("What is my name?", user_id="alice123"))
get_res(m.search("Who is Spider‑Man?", user_id="alice123"))

Verification Results

Querying "What is my name?" returns a low‑scoring vector match but a high‑confidence graph relation identifying the name as "Alice".

Querying "Who is Spider‑Man?" similarly shows the graph correctly extracting the relation.

Combining vector and graph memories enables AI agents to maintain coherent, personalized conversations over long periods, improving user experience in chatbots, customer support, and other interactive applications.

LLM vector database Milvus Memory LangGraph

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.