Databases 21 min read

Build a Production-Ready Milvus Vector Database for Semantic Search

This article walks through deploying Milvus with Docker Compose, creating a persistent collection, tuning HNSW indexes, integrating LangChain.js for semantic retrieval, and covering performance tips and common pitfalls to run a production‑grade vector database.

James' Growth Diary

Apr 21, 2026

Build a Production-Ready Milvus Vector Database for Semantic Search

Why FAISS is unsuitable for production

FAISS is an in‑memory index library that is fast for demos but lacks persistence, CRUD operations, metadata filtering, distributed scaling, thread safety, and production monitoring. In a production workload with millions of documents, daily inserts, metadata filters, and hundreds of concurrent requests, these limitations become fatal.

Data persistence: ❌ In‑memory, lost on restart; ✅ Milvus stores vectors on disk.

CRUD: ❌ No delete/update; ✅ Milvus supports full CRUD.

Metadata filter: ❌ Vector‑only queries; ✅ Milvus supports scalar + vector mixed queries.

Distributed scaling: ❌ Single‑node only; ✅ Milvus provides native distributed deployment.

Concurrent access: ❌ Not thread‑safe; ✅ Milvus allows multi‑client concurrency.

Production monitoring: ❌ None; ✅ Integrated with Prometheus + Grafana.

1️⃣ Quick Milvus deployment with Docker Compose

Use a docker-compose.yml that starts etcd (metadata store), MinIO (object storage), and Milvus in standalone mode. The services are mounted to persistent volumes to avoid data loss.

# docker-compose.yml
version: '3.5'
services:
  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    healthcheck:
      test: ["CMD", "etcdctl", "endpoint", "health"]
      interval: 30s
      timeout: 20s
      retries: 3

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    ports:
      - "9001:9001"
      - "9000:9000"
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
    command: minio server /minio_data --console-address ":9001"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.4.0
    command: ["milvus", "run", "standalone"]
    security_opt:
      - seccomp:unconfined
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9091/healthz"]
      interval: 30s
      start_period: 90s
      timeout: 20s
      retries: 3
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - etcd
      - minio

networks:
  default:
    name: milvus

Start the stack with docker-compose up -d, wait ~30 seconds, then verify with curl http://localhost:9091/healthz. The endpoint should return {"status":"ok"}. If MinIO starts slowly, Milvus may report a connection error; wait until MinIO is healthy before restarting.

Milvus Docker Compose deployment diagram

2️⃣ Create and load a Milvus collection

A collection in Milvus is analogous to a relational table but can store vector fields. The example defines fields for id (Int64 primary key), content, source, category, created_at (timestamp), and embedding (FloatVector with dim = 1536, matching OpenAI text‑embedding‑3‑small).

import { MilvusClient, DataType } from "@zilliz/milvus2-sdk-node";

const client = new MilvusClient({ address: "localhost:19530" });
const COLLECTION_NAME = "knowledge_base";
const DIMENSION = 1536; // OpenAI text‑embedding‑3‑small

async function initCollection() {
  const exists = await client.hasCollection({ collection_name: COLLECTION_NAME });
  if (exists.value) {
    console.log("Collection already exists, skipping creation");
    return;
  }
  await client.createCollection({
    collection_name: COLLECTION_NAME,
    fields: [
      { name: "id", data_type: DataType.Int64, is_primary_key: true, autoID: true },
      { name: "content", data_type: DataType.VarChar, max_length: 65535 },
      { name: "source", data_type: DataType.VarChar, max_length: 512 },
      { name: "category", data_type: DataType.VarChar, max_length: 128 },
      { name: "created_at", data_type: DataType.Int64 },
      { name: "embedding", data_type: DataType.FloatVector, dim: DIMENSION }
    ]
  });
  await client.createIndex({
    collection_name: COLLECTION_NAME,
    field_name: "embedding",
    index_type: "HNSW",
    metric_type: "COSINE",
    params: { M: 16, efConstruction: 256 }
  });
  await client.loadCollection({ collection_name: COLLECTION_NAME });
  console.log(`Collection "${COLLECTION_NAME}" created and loaded`);
}

initCollection().catch(console.error);

Calling loadCollection is mandatory; otherwise queries return collection not loaded.

3️⃣ LangChain.js integration

LangChain provides a thin wrapper around Milvus. After initializing MilvusVectorStore, you can add documents and perform similarity searches with optional scalar filters.

import { Milvus } from "@langchain/community/vectorstores/milvus";
import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small", apiKey: process.env.OPENAI_API_KEY });

// Connect to an existing collection
const vectorStore = await Milvus.fromExistingCollection(embeddings, {
  collectionName: "knowledge_base",
  url: "http://localhost:19530",
  textField: "content",
  vectorField: "embedding"
});

// Add documents (batch insertion recommended)
const ids = await vectorStore.addDocuments(docs);
console.log(`Inserted ${ids.length} documents`);

// Basic similarity search
const results = await vectorStore.similaritySearch("How to use a vector DB for semantic search?", 5);
results.forEach((doc, i) => {
  console.log(`[${i + 1}] ${doc.pageContent}`);
  console.log(`    source: ${doc.metadata.source}`);
});

// Metadata‑filtered search
const filtered = await vectorStore.similaritySearch(
  "Vector DB selection advice",
  5,
  'category == "database"'
);

// Search with scores
const withScores = await vectorStore.similaritySearchWithScore("Open source LLM tools", 5);
withScores.forEach(([doc, score]) => {
  console.log(`Score: ${score.toFixed(4)} | ${doc.pageContent.slice(0, 50)}`);
});

LangChain Milvus integration: ingestion and filtered retrieval

4️⃣ Full RAG pipeline

The end‑to‑end Retrieval‑Augmented Generation flow consists of document loading → splitting → vector store ingestion → retrieval → LLM answer generation.

import { Milvus } from "@langchain/community/vectorstores/milvus";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { DirectoryLoader } from "langchain/document_loaders/fs/directory";
import { TextLoader } from "langchain/document_loaders/fs/text";
import { createRetrievalChain } from "langchain/chains/retrieval";
import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
import { ChatPromptTemplate } from "@langchain/core/prompts";

const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small", apiKey: process.env.OPENAI_API_KEY });
const llm = new ChatOpenAI({ model: "gpt-4o-mini", temperature: 0 });

async function loadAndSplitDocuments() {
  const loader = new DirectoryLoader("./docs", {
    ".txt": (path) => new TextLoader(path),
    ".md": (path) => new TextLoader(path)
  });
  const rawDocs = await loader.load();
  console.log(`Loaded ${rawDocs.length} raw documents`);
  const splitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 200 });
  const splitDocs = await splitter.splitDocuments(rawDocs);
  console.log(`Split into ${splitDocs.length} chunks`);
  return splitDocs;
}

async function buildVectorStore(docs) {
  console.log("Building vector store...");
  const vectorStore = await Milvus.fromDocuments(docs, embeddings, {
    collectionName: "rag_knowledge_base",
    url: "http://localhost:19530",
    batchSize: 100
  });
  console.log(`Vector store ready with ${docs.length} documents`);
  return vectorStore;
}

async function buildRAGChain(vectorStore) {
  const retriever = vectorStore.asRetriever({ k: 5, filter: undefined });
  const prompt = ChatPromptTemplate.fromTemplate(`
You are a knowledge‑base Q&A assistant. Answer the question using the retrieved context.
If the context does not contain relevant information, reply "I couldn't find relevant information in the knowledge base".

Context:
{context}

Question: {input}

Answer:`);
  const docChain = await createStuffDocumentsChain({ llm, prompt });
  const retrievalChain = await createRetrievalChain({ combineDocsChain: docChain, retriever });
  return retrievalChain;
}

async function main() {
  const docs = await loadAndSplitDocuments();
  const vectorStore = await buildVectorStore(docs);
  const chain = await buildRAGChain(vectorStore);
  const questions = [
    "What scenarios suit LangChain.js?",
    "What are the differences between Milvus and FAISS?",
    "How to improve RAG retrieval quality?"
  ];
  for (const q of questions) {
    console.log(`
❓ ${q}`);
    const result = await chain.invoke({ input: q });
    console.log(`💡 ${result.answer}`);
  }
}

main().catch(console.error);

5️⃣ Performance tuning

HNSW index quality depends on two core parameters set at index creation: M (max neighbor count) and efConstruction (construction‑time search queue). Larger values improve recall but increase memory usage and build time.

await client.createIndex({
  collection_name: "knowledge_base",
  field_name: "embedding",
  index_type: "HNSW",
  metric_type: "COSINE",
  params: { M: 16, efConstruction: 256 }
});

At query time, ef controls the search queue size; it should be at least 2‑4 × k (the number of results) to maintain high recall.

const retriever = vectorStore.asRetriever({
  k: 10,
  searchParams: { ef: 64 }
});

Batch insertion dramatically speeds up ingestion—insert in groups of 100‑500 documents instead of one‑by‑one.

async function batchInsert(vectorStore, docs, batchSize = 500) {
  const total = docs.length;
  let inserted = 0;
  for (let i = 0; i < total; i += batchSize) {
    const batch = docs.slice(i, i + batchSize);
    try {
      await vectorStore.addDocuments(batch);
      inserted += batch.length;
      console.log(`Progress: ${inserted}/${total} (${Math.round(inserted/total*100)}%)`);
      if (i + batchSize < total) await new Promise(r => setTimeout(r, 100));
    } catch (e) {
      console.error(`Batch ${i}-${i+batchSize} failed:`, e);
    }
  }
  console.log(`Insertion complete: ${inserted}/${total}`);
}

Performance reference for a single‑node standalone deployment:

1 M vectors, M = 8, efConstruction = 128, ef = 32 → p99 latency ≈ 15 ms, recall ≈ 95 %.

1 M vectors, M = 16, efConstruction = 256, ef = 64 → p99 latency ≈ 25 ms, recall ≈ 98 %.

5 M vectors, M = 16, efConstruction = 256, ef = 64 → p99 latency ≈ 50 ms, recall ≈ 97 %.

5 M vectors, M = 32, efConstruction = 256, ef = 128 → p99 latency ≈ 80 ms, recall ≈ 99 %.

6️⃣ Common pitfalls and how to avoid them

Collection not loaded : After creating a collection, always call loadCollection before any search or similaritySearch. Skipping this step yields the error collection not loaded.

Dimension mismatch : The dim defined in the schema must match the embedding model. Changing the model requires dropping and recreating the collection with the new dimension.

Filter syntax errors : Milvus uses a SQL‑like expression but requires double‑quoted strings and == for equality. Example correct filter: category == "tech" or created_at > ${new Date("2024-01-01").getTime()}.

Docker data loss : Without volume mounts for etcd, MinIO, and Milvus, container recreation deletes all persisted vectors. Always bind host directories as shown in the compose file.

Batch insertion failures : Inserting a huge batch can cause the whole operation to roll back on a single bad record. Use smaller batches with try‑catch and retry logic.

7️⃣ Checklists

Deployment checklist : docker‑compose includes etcd, MinIO, Milvus; volumes are mounted; health endpoint curl http://localhost:9091/healthz returns {"status":"ok"}; port 19530 reachable.

Collection initialization checklist : vector dimension matches embedding model; HNSW index created; loadCollection called; VarChar fields have appropriate max_length.

LangChain integration checklist : textField and vectorField map to schema names; metadata fields are defined in the collection; batch insertion uses error handling.

Performance checklist : choose M based on data volume (8‑16 for small, 16‑32 for large); set ef_search ≥ k; use batch size 100‑500 for bulk inserts.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LangChain RAG Vector Database Milvus HNSW Semantic Search Docker Compose

Written by

James' Growth Diary

I am James, focusing on AI Agent learning and growth. I continuously update two series: “AI Agent Mastery Path,” which systematically outlines core theories and practices of agents, and “Claude Code Design Philosophy,” which deeply analyzes the design thinking behind top AI tools. Helping you build a solid foundation in the AI era.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.