How Lance Implements MVCC Transactions with Optimistic Concurrency and Automatic Conflict Resolution
Lance uses Multi-Version Concurrency Control to provide ACID guarantees, creating immutable table versions on each commit and employing atomic storage primitives, rebase logic, and retry mechanisms to handle concurrent writes, conflict detection, and resolution across multiple transaction types.
Overview
Lance implements Multi-Version Concurrency Control (MVCC) to provide ACID guarantees for concurrent reads and writes. Each commit creates a new immutable table version using atomic storage operations. All versions form a serializable history that enables time‑travel, schema evolution, and other version‑aware features.
Commit Protocol
Storage primitives
rename‑if‑not‑exists : atomic rename that succeeds only when the target path does not already exist.
put‑if‑not‑exists : atomic write that succeeds only when the target object does not already exist (conditional PUT).
Implementation handlers
RenameCommitHandler – writes to a temporary path then calls rename_if_not_exists(tmp_path → final_path); on conflict returns CommitError::CommitConflict.
ConditionalPutCommitHandler – uses PutOptions::Create (put‑if‑not‑exists); on existing path returns CommitError::CommitConflict.
CommitLock – acquires an external lock (e.g., DynamoDB), writes the manifest, releases the lock; if the manifest path already exists it releases the lock and returns CommitError::CommitConflict.
Only the writer that successfully creates the manifest for a given version proceeds; all others receive CommitConflict and must retry.
Manifest naming schemes
V1: _versions/{version}.manifest (e.g., 1.manifest, 2.manifest).
V2: _versions/{u64::MAX - version:020}.manifest – reverse lexical order, e.g., 18446744073709551614.manifest for version 1.
Transaction files
Each transaction records a read_version. A successful commit creates a new version read_version + 1 and writes a transaction file _transactions/{read_version}-{uuid}.txn. The file stores a serialized protobuf describing the attempt and is used for conflict detection and rebase.
Commit algorithm
The commit process attempts to write a new manifest atomically using the storage primitives. On conflict it loads newer transaction files, runs a rebase check, and retries with updated state. The number of retries is configurable (default CommitConfig::num_retries = 20).
Terminology
Rebasable : the transaction can be transformed to apply on the latest state and automatically retried.
Retryable : the transaction cannot be rebased but can be re‑executed after the application reads the latest data.
Incompatible : a fundamental conflict; retry would violate the original semantics, so the commit fails with a non‑retryable error.
Code walk‑through
The function commit_transaction (rust/lance/src/io/commit.rs) contains a retry loop:
Load transactions committed after read_version ( load_and_sort_new_transactions).
Use TransactionRebase to check whether the current transaction can be replayed on the newest manifest.
If no conflict, rebase the transaction, produce a new target version, and attempt to write the manifest.
If a conflict is detected, return an error such as RetryableCommitConflict for the caller to decide on retry.
On CommitError::CommitConflict, back off and retry.
Conflict resolution
TransactionRebase(rust/lance/src/io/commit/conflict_resolver.rs) tracks:
Fragment tracking – maps fragments present in the read version and marks those needing rewrite.
Modification detection – records fragment IDs that were added, updated, or deleted.
Affected rows – stores row identifiers for delete/update operations to enable fine‑grained conflict checks.
Fragment reuse indices – accumulates metadata for reused fragments during concurrent rewrites.
The rebase algorithm compares fragment modifications, merges deletion vectors, updates reuse indices, and either produces a rebased transaction or returns a conflict error ( RetryableCommitConflict or Incompatible).
Conflict scenarios
Rebasable example: Two writers delete different rows in the same fragment. Writer A commits first; Writer B detects the new manifest, merges deletion vectors, rebases, and successfully commits a later version.
Retryable example: Writer A compacts fragments while Writer B updates a row in a fragment that disappears after the compaction. Writer B cannot rebase and receives a retryable conflict error; it must reread the latest version and retry the update.
Incompatible example: Writer A restores the table to an earlier version; Writer B attempts to delete rows that were added after that version. The delete operation is no longer meaningful, leading to a non‑retryable incompatibility error.
Transaction types
The authoritative definition resides in protos/transaction.proto (https://github.com/lance-format/lance/blob/main/protos/transaction.proto). Each transaction contains a read_version, a unique uuid, and an operation field selecting one of the following types:
Append : adds new fragments without modifying existing data; fragment IDs are assigned during manifest construction.
Delete : marks rows as deleted via deletion vectors; the predicate field stores the delete condition for conflict detection.
Overwrite : replaces the entire table with new data, schema, or configuration.
CreateIndex : adds, replaces, or drops secondary indices (vector, scalar, or full‑text).
Rewrite : reorganizes data without changing semantics (compaction, sorting). Requires a prior ReserveFragments to allocate new fragment IDs.
Merge : adds new columns; all fragments must be updated with the new schema.
Project : removes columns; only metadata changes, no data files are modified.
Restore : reverts the table to a previous version.
ReserveFragments : pre‑allocates fragment IDs for subsequent Rewrite operations.
Clone : creates a shallow or deep copy of a table; shallow copies reference original files via base_paths, deep copies use storage‑native copy operations.
Update : modifies row values without adding or deleting rows; supports REWRITE_ROWS (rewrite affected rows) and REWRITE_COLUMNS (rewrite affected columns) modes.
UpdateConfig : changes table configuration, metadata, or schema without touching data.
DataReplacement : replaces data in specific column regions with new files.
UpdateMemWalState : updates the state of the in‑memory write‑ahead log index.
UpdateBases : adds new base paths so the table can reference data stored elsewhere.
Big Data Technology Tribe
Focused on computer science and cutting‑edge tech, we distill complex knowledge into clear, actionable insights. We track tech evolution, share industry trends and deep analysis, helping you keep learning, boost your technical edge, and ride the digital wave forward.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
