Databases 17 min read

From 64 to a Million Tenants: Choosing the Right Milvus Multi‑Tenant Layer and Avoiding the 65,536 Ceiling

The article dissects Milvus's four‑layer multi‑tenant architecture—Database, Collection, Partition, and Partition Key—detailing each layer's default tenant limits, isolation strength versus scalability trade‑offs, hidden constraints like the 65,536 capacity ceiling, the Partition Key isolation switch, and practical guidance for selecting the appropriate layer in SaaS and regulated scenarios.

Shuge Unlimited
Shuge Unlimited
Shuge Unlimited
From 64 to a Million Tenants: Choosing the Right Milvus Multi‑Tenant Layer and Avoiding the 65,536 Ceiling

1. Four‑layer isolation spectrum

Official docs list Database, Collection, Partition, Partition Key as four strategies, but source analysis shows they form a spectrum where isolation strength decreases while scalability increases. Default limits from configs/milvus.yaml and paramtable: Database 64 (maxDatabaseNum, v2.3.0, physical+quota), Collection 65,536 (maxCollectionNum, early), Partition 1,024 per collection (maxPartitionNum, v2.0.0, physical shared schema), Partition Key up to millions (physical+logical hybrid, introduced in v2.2.9).

Each step jumps one or two orders of magnitude in tenant capacity at the cost of one level lower isolation. Database gives strong physical isolation; Partition Key achieves million‑scale tenants by sharing a collection and using hash routing to 16 physical partitions.

Each layer breaks the previous bottleneck

One collection per tenant: strong isolation but collection explosion.

One partition per tenant: 1,024 limit insufficient for SaaS.

Scalar tenant_id filter: no partition pruning, full‑table scan.

Partition Key (introduced in v2.2.9, PR #24995) was added to break both physical and logical bottlenecks.

Guidance: ToC (million‑tenant) scenarios use Partition Key; ToB (tens of thousands) use Collection level; highly regulated use Database level.

2. Partition Key’s two‑sided nature

Originally a performance optimization, Partition Key now also enforces security via isolation.

Not one partition per tenant – data is hashed to 16 buckets

Docs state that under Partition Key mode data is automatically routed to 16 physical partitions. The routing code in pkg/util/typeutil/hash.go uses murmur3 for Int64 keys and crc32 (first 100 bytes) for String keys; only Int64 and VarChar are supported.

Because the hash algorithm differs, key‑type choice affects distribution uniformity.

Cross‑tenant leakage risk

If a query omits the partition key filter, the engine scans all 16 partitions, exposing data across tenants. Version 2.4 introduced the partitionkey.isolation switch (config constant in pkg/common/common.go:322). When enabled, queries must contain an equality filter on the partition key; otherwise execution is rejected. Validation logic resides in internal/util/exprutil/expr_checker.go:: ValidatePartitionKeyIsolation with strict rules (e.g., tenant_id == 42 && status > 0 allowed, tenant_id == 42 || status > 0 rejected, missing tenant_id rejected).

The switch defaults to off; users unaware of it may run with no isolation.

Materialized view: physical acceleration for isolation

Search path code internal/proxy/task_search.go:: setQueryInfoIfMvEnable ties isolation checks with materialized view creation. Issue #29892 notes that the view’s goal is “Improve Filtered Search on Partition Key”. Isolation forces the filter, enabling the view to prune partitions.

3. The hidden 65,536 capacity ceiling

Milvus enforces maxGeneralCapacity (rootCoord.maxGeneralCapacity) ≤ 65,536 for the sum of partitionCount × shardCount across the whole cluster (default lower bound 512). The check is in component_param.go and validated during collection and partition creation.

This constraint is not documented in the public multi‑tenancy guide. For Partition‑level tenancy, a collection with 1,024 partitions and 2 shards consumes 2,048 of the pool, allowing only ~32 such collections before hitting the ceiling.

4. Database level is a quota and permission container

Database isolation comes from a set of properties (replica number, resource groups, disk quota, max collections, DDL deny switches) defined in pkg/common/common.go. These properties can be set per‑database, overriding global quotas (see create_collection_task.go logic).

Thus Database level can enforce strong isolation for a few high‑privilege tenants.

5. Evolution truth: underlying pipeline changed, model stayed

Milvus 2.6 → 3.0 kept the four‑strategy model; changes are in the underlying pipeline: DDL now uses WAL (Issue #33285) improving idempotency and channel limits, and new constraints such as one stream node per pchannel.

Version timeline (2.0, 2.2.9, 2.3.0, 2.3.4/2.3.5, 2.4, 2.5, 2.6, 3.0) shows when each feature was added.

Pitfall: usePartitionKeyAsClusteringKey

Enabling dataCoord.usePartitionKeyAsClusteringKey (default false) makes the partition key also a clustering key, but Issue #32329 reports that major compaction hangs when true. Hence keep it false for now.

Conclusion

Milvus multi‑tenant design is a spectrum of isolation vs scalability: Database (64), Collection (65,536), Partition (1,024), Partition Key (million). Partition Key evolved from performance to a security mechanism tied to isolation and materialized views. Database isolation relies on quota and RBAC properties. Two hidden constraints— maxGeneralCapacity = 65,536 and the partitionkey.isolation switch—must be checked when planning capacity and security.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

vector databaseMilvuscapacity planningMulti-TenancyDatabase isolationPartition Key
Shuge Unlimited
Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.