Why Is Milvus, the 43K‑Star Vector Database, So Powerful?
This article analyzes Milvus—its open‑source origins, three deployment modes, four‑layer architecture, eight‑plus indexing algorithms, real‑world case studies, and a detailed comparison with competitors—highlighting its strengths, weaknesses, common pitfalls, and when it’s the right choice for large‑scale AI workloads.
What Is Milvus
Milvus is an open‑source vector database developed by Zilliz, now donated to the LF AI & Data Foundation under the Apache 2.0 license. The name comes from the Latin genus of eagles, implying speed and sharp vision.
It offers three deployment modes that are crucial for selection:
Milvus Lite – a Python library; install with pip install milvus and run prototypes in Jupyter notebooks.
Milvus Standalone – single‑node Docker deployment suitable for small‑to‑medium production without Kubernetes.
Milvus Distributed – Kubernetes‑based cluster deployment that scales from millions to billions of vectors.
All modes share a compatible API, allowing seamless migration.
Architecture Design: Storage‑Compute Separation
Milvus follows three core design principles: separating storage from compute, separating control plane from data plane, and enabling cloud‑native elastic scaling. Its four‑layer architecture consists of:
Access Layer – stateless proxies that validate requests and aggregate results; typically fronted by Nginx or a K8s Ingress for load balancing.
Coordinator Layer – the system’s brain, handling cluster topology, load balancing, timestamp generation, and data declaration management.
Worker Node – executes commands from the coordinator; query nodes and data nodes are deployed separately to avoid interference.
Storage Layer – split into three components: metadata stored in etcd, message storage using Pulsar or Kafka (Woodpecker replaces them in version 2.6), and object storage via MinIO, S3, or Azure Blob.
Storage‑compute separation lets you scale compute and storage independently, but it also increases operational complexity because of the many components involved.
Index Algorithms: Over 8 Types Covering All Scenarios
Milvus provides a rich set of index types, which distinguishes it from most competitors. Key algorithms include:
Index | Principle | Typical Use Case | Key Feature
--------|--------------------------|--------------------------------|--------------------------
FLAT | Brute‑force search | Small datasets, exact search | 100% recall, slow
IVF_FLAT| Inverted file + brute | Medium datasets | Balanced speed/accuracy
IVF_SQ8 | Inverted file + scalar quantization | Large, memory‑constrained | 70‑75% memory saving
IVF_PQ | Inverted file + product quantization | Very large datasets | High compression, lossy
HNSW | Hierarchical Navigable Small World graph | High‑performance search | Fast, high memory use
HNSW_SQ/PQ/PRQ | HNSW + quantization | Large + high‑performance | Memory‑speed trade‑off
DiskANN | SSD‑based graph index | Billions of vectors | Low memory, 5 ms latency @95% recall
GPU Index | GPU acceleration | Ultra‑high‑throughput scenarios| Requires GPU hardwareNotably, DiskANN can index billions of vectors on SSD with 95 % recall and only 5 ms latency, making it attractive for budget‑constrained teams handling massive data.
Index selection guidelines:
Data < 1 M vectors → use HNSW.
1 M – 100 M vectors → use IVF_SQ8.
> 100 M vectors → consider DiskANN (also factor in dimension, QPS, and hardware).
Milvus also supports hybrid search (vector + scalar + full‑text BM25) and added multi‑language full‑text search in version 2.6.
Who Is Using Milvus?
Real‑world adopters illustrate its production readiness:
Reddit evaluated Qdrant, Milvus, Vespa, and Weaviate on 340 M vectors in Kubernetes. They chose Milvus not for benchmark supremacy but because it fit their engineering culture, scaled cleanly with replication, and was easier to operate.
Vipshop migrated a personalized recommendation system from Elasticsearch to Milvus, achieving a ten‑fold query speed increase and sub‑30 ms latency for millions of vectors.
Other users include Salesforce, Roblox, Palo Alto Networks, Otter.ai, Shell, and AT&T.
Comparison with Competitors
Dimension | Milvus | Qdrant | Pinecone (hosted) | Weaviate | Chroma
----------|----------------|--------|-------------------|----------|-------
Type | Open‑source distributed | Open‑source | Hosted cloud service | Open‑source | Lightweight open‑source
Language | Go + C++ | Rust | Closed source | Go | Python
Index Types| 8+ | HNSW‑centric | Proprietary | HNSW | HNSW
Scale Upper| 100 B vectors | 10 B | 10 B | 10 B | 1 M
p50 Latency| 6 ms | 4 ms | 8 ms | 12 ms | 15 ms
GPU Support| Yes | No | No | No | NoKey takeaways:
Latency: Qdrant leads with 4 ms p50; Milvus is slower at 6 ms.
Index richness: Milvus’s eight‑plus algorithms form a strong moat; most rivals only offer HNSW.
Scale: Only Milvus claims support for hundred‑billion‑vector workloads.
GPU acceleration is exclusive to Milvus.
However, for sub‑million vectors, PostgreSQL + pgvector may suffice, avoiding the overhead of a dedicated vector DB.
Real‑User Feedback
"It's a vector database that doesn't randomly shit the bed when you scale past your laptop. I've been running it for 8 months and it hasn't woken me up with production alerts."
Positive remarks also note good accuracy, containerization, and fast scaling.
Performance test on 1 M vectors showed latency below 3 ms and QPS > 10× Elasticsearch, corroborating Vipshop’s claim.
Negative feedback highlights:
High memory consumption, especially with HNSW; several GitHub issues (e.g., #32695, #40270) report excessive memory and CPU usage.
etcd can become a bottleneck; a Milvus 2.4.17 standalone instance experienced > 7 s etcd latency when disk utilization exceeded 95 %.
Operational complexity due to multiple components (etcd, MinIO/S3, Pulsar/Kafka, many nodes) is often underestimated.
Documentation quality varies, especially for advanced configuration and performance tuning.
Common Pitfalls
Mis‑configuring indexes (e.g., using FLAT with high‑dimensional embeddings and no partitioning) leads to slow retrieval.
Insufficient disk I/O for etcd; 99th‑percentile latency should stay below 10 ms, requiring high‑performance SSDs and dedicated storage.
Improper shard settings: single tables should have ≤ 8 shards, default topic partitions are 256, and replica count should stay ≤ 10.
Selection Guidance
Choosing Milvus depends on workload size and team expertise.
Ideal scenarios :
Data volume exceeds ten million vectors and requires distributed scaling.
Need to switch among multiple index types (in‑memory, disk‑based, GPU‑accelerated).
Professional Kubernetes operations team is available.
Hybrid search (vector + scalar + full‑text) is required.
Less suitable scenarios :
Data volume below one million; lighter solutions like Chroma or pgvector are simpler.
No dedicated ops team; prefer Milvus Lite or Standalone rather than Distributed.
Pure keyword search; Elasticsearch is more mature.
Strong transactional consistency needs; Milvus is not a relational database.
Market Outlook
The vector‑database market was $22 B in 2024 and is projected to reach $188.6 B by 2035, a 23 % CAGR. As the most feature‑complete open‑source offering, Milvus is well positioned to capture a share of this growth.
Resources
Official documentation: https://milvus.io/docs
GitHub repository: https://github.com/milvus-io/milvus
Capacity‑planning tool: https://milvus.io/tools/sizing
Open‑source benchmark suite VectorDBBench: https://github.com/zilliztech/VectorDBBench
Managed service (Zilliz Cloud): https://zilliz.com
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Shuge Unlimited
Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
