Build a Production-Ready Milvus Vector Database for Semantic Search
This article walks through deploying Milvus with Docker Compose, creating a persistent collection, tuning HNSW indexes, integrating LangChain.js for semantic retrieval, and covering performance tips and common pitfalls to run a production‑grade vector database.
Why FAISS is unsuitable for production
FAISS is an in‑memory index library that is fast for demos but lacks persistence, CRUD operations, metadata filtering, distributed scaling, thread safety, and production monitoring. In a production workload with millions of documents, daily inserts, metadata filters, and hundreds of concurrent requests, these limitations become fatal.
Data persistence: ❌ In‑memory, lost on restart; ✅ Milvus stores vectors on disk.
CRUD: ❌ No delete/update; ✅ Milvus supports full CRUD.
Metadata filter: ❌ Vector‑only queries; ✅ Milvus supports scalar + vector mixed queries.
Distributed scaling: ❌ Single‑node only; ✅ Milvus provides native distributed deployment.
Concurrent access: ❌ Not thread‑safe; ✅ Milvus allows multi‑client concurrency.
Production monitoring: ❌ None; ✅ Integrated with Prometheus + Grafana.
1️⃣ Quick Milvus deployment with Docker Compose
Use a docker-compose.yml that starts etcd (metadata store), MinIO (object storage), and Milvus in standalone mode. The services are mounted to persistent volumes to avoid data loss.
# docker-compose.yml
version: '3.5'
services:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
healthcheck:
test: ["CMD", "etcdctl", "endpoint", "health"]
interval: 30s
timeout: 20s
retries: 3
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
ports:
- "9001:9001"
- "9000:9000"
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
command: minio server /minio_data --console-address ":9001"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.4.0
command: ["milvus", "run", "standalone"]
security_opt:
- seccomp:unconfined
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
interval: 30s
start_period: 90s
timeout: 20s
retries: 3
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- etcd
- minio
networks:
default:
name: milvusStart the stack with docker-compose up -d, wait ~30 seconds, then verify with curl http://localhost:9091/healthz. The endpoint should return {"status":"ok"}. If MinIO starts slowly, Milvus may report a connection error; wait until MinIO is healthy before restarting.
2️⃣ Create and load a Milvus collection
A collection in Milvus is analogous to a relational table but can store vector fields. The example defines fields for id (Int64 primary key), content, source, category, created_at (timestamp), and embedding (FloatVector with dim = 1536, matching OpenAI text‑embedding‑3‑small).
import { MilvusClient, DataType } from "@zilliz/milvus2-sdk-node";
const client = new MilvusClient({ address: "localhost:19530" });
const COLLECTION_NAME = "knowledge_base";
const DIMENSION = 1536; // OpenAI text‑embedding‑3‑small
async function initCollection() {
const exists = await client.hasCollection({ collection_name: COLLECTION_NAME });
if (exists.value) {
console.log("Collection already exists, skipping creation");
return;
}
await client.createCollection({
collection_name: COLLECTION_NAME,
fields: [
{ name: "id", data_type: DataType.Int64, is_primary_key: true, autoID: true },
{ name: "content", data_type: DataType.VarChar, max_length: 65535 },
{ name: "source", data_type: DataType.VarChar, max_length: 512 },
{ name: "category", data_type: DataType.VarChar, max_length: 128 },
{ name: "created_at", data_type: DataType.Int64 },
{ name: "embedding", data_type: DataType.FloatVector, dim: DIMENSION }
]
});
await client.createIndex({
collection_name: COLLECTION_NAME,
field_name: "embedding",
index_type: "HNSW",
metric_type: "COSINE",
params: { M: 16, efConstruction: 256 }
});
await client.loadCollection({ collection_name: COLLECTION_NAME });
console.log(`Collection "${COLLECTION_NAME}" created and loaded`);
}
initCollection().catch(console.error);Calling loadCollection is mandatory; otherwise queries return collection not loaded.
3️⃣ LangChain.js integration
LangChain provides a thin wrapper around Milvus. After initializing MilvusVectorStore, you can add documents and perform similarity searches with optional scalar filters.
import { Milvus } from "@langchain/community/vectorstores/milvus";
import { OpenAIEmbeddings } from "@langchain/openai";
const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small", apiKey: process.env.OPENAI_API_KEY });
// Connect to an existing collection
const vectorStore = await Milvus.fromExistingCollection(embeddings, {
collectionName: "knowledge_base",
url: "http://localhost:19530",
textField: "content",
vectorField: "embedding"
});
// Add documents (batch insertion recommended)
const ids = await vectorStore.addDocuments(docs);
console.log(`Inserted ${ids.length} documents`);
// Basic similarity search
const results = await vectorStore.similaritySearch("How to use a vector DB for semantic search?", 5);
results.forEach((doc, i) => {
console.log(`[${i + 1}] ${doc.pageContent}`);
console.log(` source: ${doc.metadata.source}`);
});
// Metadata‑filtered search
const filtered = await vectorStore.similaritySearch(
"Vector DB selection advice",
5,
'category == "database"'
);
// Search with scores
const withScores = await vectorStore.similaritySearchWithScore("Open source LLM tools", 5);
withScores.forEach(([doc, score]) => {
console.log(`Score: ${score.toFixed(4)} | ${doc.pageContent.slice(0, 50)}`);
});4️⃣ Full RAG pipeline
The end‑to‑end Retrieval‑Augmented Generation flow consists of document loading → splitting → vector store ingestion → retrieval → LLM answer generation.
import { Milvus } from "@langchain/community/vectorstores/milvus";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { DirectoryLoader } from "langchain/document_loaders/fs/directory";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { ChatPromptTemplate } from "@langchain/core/prompts";
const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small", apiKey: process.env.OPENAI_API_KEY });
const llm = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });
async function loadAndSplitDocuments() {
const loader = new DirectoryLoader("./docs", {
".txt": (path) => new TextLoader(path),
".md": (path) => new TextLoader(path)
});
const rawDocs = await loader.load();
console.log(`Loaded ${rawDocs.length} raw documents`);
const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200 });
const splitDocs = await splitter.splitDocuments(rawDocs);
console.log(`Split into ${splitDocs.length} chunks`);
return splitDocs;
}
async function buildVectorStore(docs) {
console.log("Building vector store...");
const vectorStore = await Milvus.fromDocuments(docs, embeddings, {
collectionName: "rag_knowledge_base",
url: "http://localhost:19530",
batchSize: 100
});
console.log(`Vector store ready with ${docs.length} documents`);
return vectorStore;
}
async function buildRAGChain(vectorStore) {
const retriever = vectorStore.asRetriever({ k: 5, filter: undefined });
const prompt = ChatPromptTemplate.fromTemplate(`
You are a knowledge‑base Q&A assistant. Answer the question using the retrieved context.
If the context does not contain relevant information, reply "I couldn't find relevant information in the knowledge base".
Context:
{context}
Question: {input}
Answer:`);
const docChain = await createStuffDocumentsChain({ llm, prompt });
const retrievalChain = await createRetrievalChain({ combineDocsChain: docChain, retriever });
return retrievalChain;
}
async function main() {
const docs = await loadAndSplitDocuments();
const vectorStore = await buildVectorStore(docs);
const chain = await buildRAGChain(vectorStore);
const questions = [
"What scenarios suit LangChain.js?",
"What are the differences between Milvus and FAISS?",
"How to improve RAG retrieval quality?"
];
for (const q of questions) {
console.log(`
❓ ${q}`);
const result = await chain.invoke({ input: q });
console.log(`💡 ${result.answer}`);
}
}
main().catch(console.error);5️⃣ Performance tuning
HNSW index quality depends on two core parameters set at index creation: M (max neighbor count) and efConstruction (construction‑time search queue). Larger values improve recall but increase memory usage and build time.
await client.createIndex({
collection_name: "knowledge_base",
field_name: "embedding",
index_type: "HNSW",
metric_type: "COSINE",
params: { M: 16, efConstruction: 256 }
});At query time, ef controls the search queue size; it should be at least 2‑4 × k (the number of results) to maintain high recall.
const retriever = vectorStore.asRetriever({
k: 10,
searchParams: { ef: 64 }
});Batch insertion dramatically speeds up ingestion—insert in groups of 100‑500 documents instead of one‑by‑one.
async function batchInsert(vectorStore, docs, batchSize = 500) {
const total = docs.length;
let inserted = 0;
for (let i = 0; i < total; i += batchSize) {
const batch = docs.slice(i, i + batchSize);
try {
await vectorStore.addDocuments(batch);
inserted += batch.length;
console.log(`Progress: ${inserted}/${total} (${Math.round(inserted/total*100)}%)`);
if (i + batchSize < total) await new Promise(r => setTimeout(r, 100));
} catch (e) {
console.error(`Batch ${i}-${i+batchSize} failed:`, e);
}
}
console.log(`Insertion complete: ${inserted}/${total}`);
}Performance reference for a single‑node standalone deployment:
1 M vectors, M = 8, efConstruction = 128, ef = 32 → p99 latency ≈ 15 ms, recall ≈ 95 %.
1 M vectors, M = 16, efConstruction = 256, ef = 64 → p99 latency ≈ 25 ms, recall ≈ 98 %.
5 M vectors, M = 16, efConstruction = 256, ef = 64 → p99 latency ≈ 50 ms, recall ≈ 97 %.
5 M vectors, M = 32, efConstruction = 256, ef = 128 → p99 latency ≈ 80 ms, recall ≈ 99 %.
6️⃣ Common pitfalls and how to avoid them
Collection not loaded : After creating a collection, always call loadCollection before any search or similaritySearch. Skipping this step yields the error collection not loaded.
Dimension mismatch : The dim defined in the schema must match the embedding model. Changing the model requires dropping and recreating the collection with the new dimension.
Filter syntax errors : Milvus uses a SQL‑like expression but requires double‑quoted strings and == for equality. Example correct filter: category == "tech" or created_at > ${new Date("2024-01-01").getTime()}.
Docker data loss : Without volume mounts for etcd, MinIO, and Milvus, container recreation deletes all persisted vectors. Always bind host directories as shown in the compose file.
Batch insertion failures : Inserting a huge batch can cause the whole operation to roll back on a single bad record. Use smaller batches with try‑catch and retry logic.
7️⃣ Checklists
Deployment checklist : docker‑compose includes etcd, MinIO, Milvus; volumes are mounted; health endpoint curl http://localhost:9091/healthz returns {"status":"ok"}; port 19530 reachable.
Collection initialization checklist : vector dimension matches embedding model; HNSW index created; loadCollection called; VarChar fields have appropriate max_length.
LangChain integration checklist : textField and vectorField map to schema names; metadata fields are defined in the collection; batch insertion uses error handling.
Performance checklist : choose M based on data volume (8‑16 for small, 16‑32 for large); set ef_search ≥ k; use batch size 100‑500 for bulk inserts.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
James' Growth Diary
I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
