Databases 23 min read

Why Deleting 1 Million Vectors in Milvus Doesn't Shrink Disk Space: A Deep Dive into 11 CompactionTypes

When Milvus appears to keep disk usage unchanged after deleting a million vectors, the cause is not a bug but a sophisticated compaction system that splits the single compact() API into eleven enum values, six independent policies, and seven special handling paths that together manage different kinds of data waste and ensure safe, incremental reclamation.

Shuge Unlimited
Shuge Unlimited
Shuge Unlimited
Why Deleting 1 Million Vectors in Milvus Doesn't Shrink Disk Space: A Deep Dive into 11 CompactionTypes

CompactionType enum

In pkg/proto/data_coord.proto the CompactionType enum defines eleven values. IDs 0 and 1 are absent because the early UndefinedCompaction was removed while keeping the enum stable for backward‑compatible metadata.

enum CompactionType {
  MergeCompaction = 2;
  MixCompaction = 3;
  SingleCompaction = 4;
  MinorCompaction = 5;
  MajorCompaction = 6;
  Level0DeleteCompaction = 7;
  ClusteringCompaction = 8;
  SortCompaction = 9;
  PartitionKeySortCompaction = 10;
  ClusteringPartitionKeySortCompaction = 11;
  BumpSchemaVersionCompaction = 12;
}

Only five types are actively used: Level0DeleteCompaction – L0 delete reclamation MixCompaction – small‑segment merge + delete cleanup SortCompaction – intra‑segment sorting rewrite ClusteringCompaction – redistribution based on a clustering key BumpSchemaVersionCompaction – segment rewrite when a collection schema changes

Six independent policies

The new trigger implementation ( compaction_trigger_v2.go) registers six policies, each exposing Enable(), Trigger() and Name(). The mapping from policy to emitted CompactionType is: l0CompactionPolicy – ticker L0Ticker, interval L0CompactionTriggerInterval, outputs

Level0DeleteCompaction
singleCompactionPolicy

– ticker SingleTicker, interval MixCompactionTriggerInterval, outputs MixCompaction and

SortCompaction
clusteringCompactionPolicy

– ticker ClusteringTicker, interval ClusteringCompactionTriggerInterval, outputs

ClusteringCompaction
storageVersionUpgradePolicy

– ticker StorageVersionTicker, interval MixCompactionTriggerInterval, outputs MixCompaction (reuses the Mix executor) bumpSchemaVersionPolicy – ticker BumpSchemaVersionTicker, interval BumpSchemaVersionCompactionTriggerInterval, outputs

BumpSchemaVersionCompaction
forceMergeCompactionPolicy

– manual only, outputs MixCompaction Each policy independently decides whether its associated “data disease” (e.g., accumulated delete logs, fragmented small segments, stale schema) needs treatment.

L0 Delete – dedicated delete‑reclamation pipeline

Milvus stores delete logs in L0 segments (no vector data). Writes are never blocked by deletions, but L0 must be compacted regularly; otherwise bloom‑filter bloat, query latency spikes, and write stalls occur.

Priority

The default LevelPrioritizer orders tasks as

L0Delete(1) < Mix/BumpSchema(10) < Clustering(100) < others(1000)

. The configuration key Params.DataCoordCfg.CompactionTaskPrioritizer can switch between level, mix, or the default. The default is level, so L0 compactions are always processed first.

Fast finish

If an L0 compaction plan finds no target L1/L2 segments, the fast‑finish path (PR #47154, code in compaction_task_l0.go:100) marks the L0 segments as dropped without invoking a DataNode, saving an unnecessary I/O round. The release notes for Milvus 2.6.16 record this improvement.

Self‑heal switch

A bug where imported segments wrote wrong timestamps caused “zombie” L0 segments that never matched any L1/L2 target, leading to silent delete loss. Enabling the LevelZeroCompactionForceSelectAll switch (PR #48907) forces all L1/L2 segments to be considered, bypassing the position check and repairing the loss. The switch returns a max‑uint64 timestamp in resolveLatestDeletePos (see compaction_l0_view.go:20).

Version compatibility

During upgrades, unfinished L0 tasks are never marked failed because L0 deletions must not be lost. The code in compaction_task_meta.go explicitly skips L0 tasks when handling pre‑allocated segment‑ID compatibility.

Snapshot exemption

Snapshot protection prevents compaction of L1/L2 segments that are referenced. L0 segments are exempt – they can be reclaimed even when a collection has active snapshots, because L0 holds only delete logs.

Scheduling, queues and mutual exclusion

Compaction tasks flow through three queues ( queueTasks → executingTasks → cleaningTasks) and a state machine (

pipelining → executing → meta_saved → completed → cleaned

). Each CompactionType has a hard‑coded maximum execution duration:

var maxCompactionTaskExecutionDuration = map[datapb.CompactionType]time.Duration{
    MixCompaction:               30 * time.Minute,
    Level0DeleteCompaction:     30 * time.Minute,
    ClusteringCompaction:       60 * time.Minute,
    SortCompaction:            20 * time.Minute,
    BumpSchemaVersionCompaction:30 * time.Minute,
}

Mutual‑exclusion sets ensure that L0 never runs concurrently with Mix/Sort/BumpSchema/Clustering on the same channel, and that Mix/Sort/BumpSchema are mutually exclusive with Clustering on the same partitionID+channel label. Conflicting tasks are placed in an excluded list and re‑enqueued after the current cycle.

调度器与互斥规则
调度器与互斥规则

User‑visible compaction entry points

ForceMerge

Introduced in PR #45556 (Milvus 2.6), ForceMerge is a variant of MixCompaction that adds topology‑aware target‑size calculation. It respects the smallest QueryNode/DataNode memory limits ( maxSafeSize) to avoid creating segments that cannot be loaded.

When a collection has very few segments, ForceMerge may increase the segment count to match the per‑shard parallelism, improving load parallelism at the cost of a larger segment count.

Clustering

Clustering (Beta in Milvus 2.5) redistributes entities based on a scalar clustering key, generating PartitionStats that allow query‑time pruning. Benchmark on a 20 M LAION dataset shows QPS rising from 17.75 (no filter) to 431.41 when filtering on key==1000 – a 25× improvement.

A temporary limitation (commented

// todo: remove this check after support partial clustering compaction

) requires all L2 segments of a partition+channel to participate; otherwise the compaction is skipped.

Evolution timeline and bug‑fix history

Key architectural milestones (PR numbers are retained for traceability):

2024‑Q2 – #37190: Split L0 and Mix trigger intervals (L0 independent)

2024‑Q3 – #39217: Introduce active‑collections mechanism for L0 policy

2025‑Q1 – #42562: Separate Sort stats task into SortCompaction 2025‑Q2 – #45556: Add ForceMerge (target‑size compaction)

2025‑Q2 – #46990: Add StorageVersionUpgrade policy (reuses Mix executor)

2025‑Q3 – #48808: Add BumpSchemaVersionCompaction Important L0‑related fixes:

2024‑Q4 – #40960: Delete‑data loss due to duplicate binlogID

2025‑Q2 – #46436: Boundary fix for latestDeletePos 2025‑Q2 – #47154: Fast‑finish when no matching L1/L2 segments

2025‑Q4 – #48907: Self‑heal for import‑position bug (zombie L0)

2026‑Q2 – #47214 / #49122: Increase default deltalog max count from 30 to 1000

Compaction 演进时间线
Compaction 演进时间线

Take‑away

Deleting a large number of vectors does not immediately shrink disk because Milvus separates the write‑path (fast L0 appends) from delete‑reclamation (periodic L0 compaction). The seven special‑treatment mechanisms – priority ordering, independent trigger intervals, exclusive mutex rules, fast‑finish, self‑heal switch, upgrade‑compatible handling, and snapshot exemption – ensure that deletions are eventually reclaimed without jeopardising write availability or snapshot consistency.

For developers, the source reveals seven distinct mechanisms, each tuned to a specific “data disease”. Understanding which path is responsible for a given symptom (e.g., disk not shrinking, query latency rising) is essential when evaluating Milvus or troubleshooting production clusters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ClusteringCompactionvector databaseMilvuspolicyDataCoordForceMergeL0 Delete
Shuge Unlimited
Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.