Databases 20 min read

Four States a Milvus Delete Passes Through – Uncovering the Most Complex Operation

A Milvus delete command looks simple, but the primary key is first stored in an L0 delta segment, then triggers compaction that merges the delete mark into L1/L2 segments, passes through four distinct segment states, and involves multiple seal strategies, rollback fields, and version‑specific behaviours, illustrating why deletion is the system's most intricate operation.

Shuge Unlimited
Shuge Unlimited
Shuge Unlimited
Four States a Milvus Delete Passes Through – Uncovering the Most Complex Operation

1. Three‑Level Segment Hierarchy

Milvus defines SegmentLevel in pkg/proto/data_coord.proto:24‑29 with four enum values: Legacy = 0 (zero value for old segments, treated as L1), L0 = 1 (channel‑level delta for deletes), L1 = 2 (normal data), and L2 = 3 (extra distribution info). The Legacy entry is a fossil of the original design.

The storage semantics of each level are:

L0 : delta data (deletes/updates) at channel scope, created by SaveBinlogPaths in DataNode/StreamingNode, initial state Flushed (see meta.go:979).

L1 : regular insert binlogs at partition scope, allocated by SegmentManager, initial state Growing ( segment_manager.go:440).

L2 : segment with extra data‑distribution info produced by clustering compaction, created in meta.go:2371, initial state Flushed + IsInvisible.

三级 Segment 对照图
三级 Segment 对照图

2. L1 Segment Lifecycle

L1 segments follow a full state machine defined in 20211109-milvus_flush_collections.md:126‑133. Although the proto lists SegmentStateNone, NotExist, Growing, Sealed, Flushed, Flushing, the actual code uses five live states: Growing, Sealed, Flushed, Dropped, and Importing (see internal/datacoord/*.go).

The health check function isSegmentHealthy ( meta.go:2659‑2664) treats Growing, Sealed, and Flushed as healthy; the others are considered dead.

The complete state transition diagram is:

[New] → Growing
        │
        │ seal (6 strategies, see below)
        ▼
      Sealed
        │
        │ DataNode writes binlog
        ▼
      Flushed
        │
        │ consumed by compaction, original segment marked Dropped
        ▼
      Dropped
        │
        │ GC confirms target segment indexed/loaded
        ▼
      [Physical deletion of binlog files]

Each transition point is anchored in the source code, e.g., openNewSegmentWithGivenSegmentID ( segment_manager.go:438) creates a Growing L1 segment, SaveBinlogPaths ( services.go:669) moves Sealed to Flushed, and recycleDroppedSegments ( garbage_collector.go:767) performs physical deletion.

The core RPC that changes state is SaveBinlogPaths ( services.go:627‑669), which simultaneously handles Flushed and Dropped updates.

2.3 Six Seal Strategies

sealL1SegmentByCapacity

: seal by binary size. sealL1SegmentByLifetime: seal when lifetime expires. sealL1SegmentByBinlogFileNumber: seal when binlog file count exceeds limit. sealL1SegmentByIdleTime: seal after idle timeout. sealByBlockingL0: forced seal triggered by L0 backlog (see Section 3). sealByTotalGrowingSegmentsSize: seal the largest growing segment when total size exceeds threshold.

The first four are conventional size/time‑based seals; the fifth ties directly to L0’s delete buffer, and the sixth is a fallback.

3. L0 – The Delete Buffer

L0 is the most subtle layer. Traditional B+‑Tree databases mark a tombstone in place; Milvus instead partitions data by segment, so a delete must know which segment holds the primary key. Milvus solves this by routing all delete messages to a channel‑level buffer – the L0 segment – as indicated by the proto comment “for current channel”.

L0 is created directly in the Flushed state (no Growing phase) because it does not receive real‑time writes; DataNode flushes the delete buffer to deltalogs and registers the segment via SaveBinlogPaths(segLevel=L0) ( services.go:629‑630, meta.go:993).

L0 阻塞 seal 时间轴
L0 阻塞 seal 时间轴

When L0 accumulates beyond BlockingL0SizeInMB or BlockingL0EntryNum, the strategy sealByBlockingL0 ( segment_allocation_policy.go:225‑320) forces any growing segment whose timestamp range overlaps the overloaded L0 to be sealed. The source comment visualises this with a timeline: if L0a and L0b exceed limits, growing segments G1‑G3 are sealed because they block L0 compaction.

This trade‑off sacrifices some write throughput to guarantee that delete marks can converge.

L0DeleteCompaction ( compaction_task_l0.go:305‑340) selects target segments by filtering out L0 itself and requiring effectiveTs < triggerPos. In other words, only L1/L2 segments whose timestamps are earlier than the L0 trigger position receive the delete deltalogs. After merging, the L0 segment becomes Dropped and is eventually reclaimed by GC.

The full delete journey is:

user delete → DML channel → DataNode memory → L0 deltalogs → L0DeleteCompaction → L1/L2 deltalogs → query filtering

.

4. L2 – Extra Distribution Layer

L2 segments carry “extra data distribution info” produced by Clustering Compaction. In v2.4 the workflow was L1 → L2 → L1 (L2 acted as a temporary marker). In v2.5+ the intermediate L2 marker was removed; failed clustering tasks mark the result segment as Dropped instead. The current code still creates L2 segments for successful clustering ( meta.go:2371), but they are now permanent identity tags rather than transient states.

The proto also defines a LastLevel field ( data_coord.proto:421‑424) used for transactional rollback. When a compaction updates a segment’s level, the previous level is stored in LastLevel via UpdateSegmentLevelOperator. If the compaction fails, RevertSegmentLevelOperator restores the original level, ensuring consistency in a distributed environment.

5. Design Trade‑offs

All compaction types operate on Segment objects:

L0DeleteCompaction: input L0, applies delta to L1/L2.

MixCompaction: merges multiple L1 segments.

ClusteringCompaction: transforms L1 into L2.

SortCompaction: sorts a single segment by primary key.

The generic compaction filter ( compaction_util.go:108‑113) processes only L1 Flushed segments, leaving L0 and L2 to their dedicated policies. This separation allows each lifecycle to follow its own merge strategy without interference.

During query loading ( handler.go:360‑371), L0 segments are classified separately and their deltalogs are explicitly loaded ( L0SegmentIDs) so that queries can filter out deleted primary keys. Consequently, a large L0 backlog increases query latency until L0 compaction merges the deletes into L1/L2, after which the extra filtering cost disappears.

GC does not delete a Dropped segment immediately. recycleDroppedSegments ( garbage_collector.go:767‑820) ensures two safety conditions: the target segment has built its index, and no QueryNode has loaded it. This “safe distance” prevents premature deletion that could cause query failures, trading extra storage for reliability.

A common misconception is that the copy_segment code path relates to load balancing. In reality, its comments ( copy_segment_meta.go:34‑51) reveal it supports snapshot restore, not load balancing.

6. Ongoing Evolution

Milvus continues to refactor segment allocation APIs. Fields like SegmentIDRequest.Level are marked deprecated and replaced by the newer AllocSegment RPC. Likewise, the max_row_num field switched from row‑count‑based sizing to binary‑size control. Each deprecation comment in the source is a footprint of real‑world engineering iteration.

Future articles will explore the segment loading pipeline that moves segments from object storage into QueryNode memory.

Thanks for reading! If you found this useful, feel free to share.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CompactionVector DatabaseMilvusSegmentDeletionL1L2L0
Shuge Unlimited
Written by

Shuge Unlimited

Formerly "Ops with Skill", now officially upgraded. Fully dedicated to AI, we share both the why (fundamental insights) and the how (practical implementation). From technical operations to breakthrough thinking, we help you understand AI's transformation and master the core abilities needed to shape the future. ShugeX: boundless exploration, skillful execution.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.