Databases 12 min read

Understanding Two-Phase Commit and Its Implementation in Lucene

This article explains the two-phase commit protocol for distributed transactions, details its generic workflow, and describes how Apache Lucene implements the protocol through its TwoPhaseCommit interface, including preparation, flushing, commit, segment handling, deletion policies, and rollback mechanisms with illustrative code examples.

政采云技术
政采云技术
政采云技术
Understanding Two-Phase Commit and Its Implementation in Lucene

Two-phase commit (2PC) is the core protocol for distributed transactions. A Transaction Manager (TM) coordinates one or more Resource Managers (RMs), collects their prepare status, and decides whether to commit or roll back the transaction.

The generic 2PC process includes:

Application submits a request to the TM, initiating a distributed transaction.

In the first phase, the TM asks all RMs to prepare for commit.

Each RM returns a success or failure message (timeout counts as failure).

In the second phase: If all RMs prepared successfully, the TM instructs them to commit. If any RM failed, the TM instructs all RMs to roll back.

Lucene’s Two-Phase Commit

Lucene defines a TwoPhaseCommit interface that underlies its document‑write commit process. The interface provides three key methods, illustrated below.

First‑phase commit: prepareCommit

During the first phase, Lucene performs as many updates as possible while avoiding a real commit. If the phase fails, a rollback discards all work. Lucene persists segment data to disk but keeps the new segment inactive by renaming files.

Pre‑commit validation

Lucene uses a lock to ensure only one thread can execute prepareCommit at a time. It checks whether a previous segmentInfo is still present to determine if a prior commit is still in progress.

synchronized(commitLock) {
    ensureOpen(false);
    if (infoStream.isEnabled("IW")) {
        infoStream.message("IW", "prepareCommit: flush");
        infoStream.message("IW", "  index before flush " + segString());
    }
    if (tragedy != null) {
        throw new IllegalStateException("this writer hit an unrecoverable error; cannot commit", tragedy);
    }
    if (pendingCommit != null) {
        throw new IllegalStateException("prepareCommit was already called with no corresponding call to commit");
    }
    // subsequent first‑phase commit logic
}

Flushing to create pending segments

Before committing, Lucene performs an additional flush to maximize the amount of data written, ensuring durability.

Assembling commit information and updating file reference counts

The user‑provided commit metadata is copied into a segmentInfos snapshot, and reference counts for all files to be committed are initialized.

// assemble user commit info
if (commitUserData != null) {
    Map
userData = new HashMap<>();
    for (Map.Entry
ent : commitUserData) {
        userData.put(ent.getKey(), ent.getValue());
    }
    segmentInfos.setUserData(userData, false);
}
// clone snapshot for commit
toCommit = segmentInfos.clone();
pendingCommitChangeCount = changeCount.get();
filesToCommit = toCommit.files(false);
// increment reference counts for new files
deleter.incRef(filesToCommit);

Persisting segments to disk

Lucene writes the pending segment files (named pending_segments_N ) to disk. These are not the final segment files.

private void write(Directory directory) throws IOException {
    long nextGeneration = getNextPendingGeneration();
    String segmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.PENDING_SEGMENTS, "", nextGeneration);
    // I/O stream and file creation logic
}

Second‑phase commit

After all pending_segments_N files are safely written, the second phase renames them to segments_N , finalizing the commit.

final String finishCommit(Directory dir) throws IOException {
    // ...
    final String src = IndexFileNames.fileNameFromGeneration(IndexFileNames.PENDING_SEGMENTS, "", generation);
    dest = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, "", generation);
    dir.rename(src, dest);
    dir.syncMetaData();
    // ...
    return dest;
}

Segment deletion policies

Lucene defines the IndexDeletionPolicy interface with four implementations:

KeepOnlyLastCommitDeletionPolicy (default) – retains only the most recent commit.

NoDeletionPolicy – keeps every commit, allowing full rollback.

SnapshotDeletionPolicy – builds on another policy and keeps an in‑memory snapshot of the latest commit.

PersistentSnapshotDeletionPolicy – similar to Snapshot but persists the snapshot to disk.

/** Deletes all commits except the most recent one. */
@Override
public void onCommit(List
commits) {
    int size = commits.size();
    for (int i = 0; i < size - 1; i++) {
        commits.get(i).delete();
    }
}

Rollback on failure

If any step in the two‑phase commit fails, Lucene rolls back by terminating pending merges, deleting temporary pending_segment_N files, and restoring the previous segmentInfo snapshot.

List
createBackupSegmentInfos() {
    final List
list = new ArrayList<>(size());
    for (final SegmentCommitInfo info : SegmentInfo) {
        assert info.info.getCodec() != null;
        list.add(info.clone());
    }
    return list;
}

In summary, Lucene uses a two‑phase commit to ensure that document writes become durable only after both phases succeed, providing a reliable way to handle failures without a separate transaction log. Higher‑level systems such as Solr or Elasticsearch add their own transaction logs for additional safety.

References

Lucene source code: https://github.com/apache/lucene-solr/tree/branch_7_2

Chris’s Cabin: https://www.amazingkoala.com.cn/

Alibaba Cloud documentation on distributed transactions: https://help.aliyun.com/document_detail/132896.html

JavaIndexingLucenedistributed transactionsTwo-Phase Commit
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.