Fundamentals 13 min read

Two‑Phase Commit in Lucene: Mechanism, Implementation, and Rollback

This article explains the two‑phase commit protocol, describes how Lucene implements it through a dedicated interface, details the preparation, commit, segment handling, deletion policies, and rollback procedures, and provides code snippets illustrating the core logic.

政采云技术
政采云技术
政采云技术
Two‑Phase Commit in Lucene: Mechanism, Implementation, and Rollback

What Is Two‑Phase Commit

The two‑phase commit protocol (2PC) is the core of distributed transactions. A Transaction Manager (TM) coordinates one or more Resource Managers (RMs); each RM reports its status (prepared or failed) to the TM, which then decides to commit or roll back the transaction.

The concrete workflow is:

Application submits a request to the TM, initiating a distributed transaction.

In phase one, the TM asks all RMs to prepare for commit.

Each RM returns a success‑or‑failure message (timeout counts as failure).

Phase two: If all RMs prepared successfully, the TM instructs them to commit. If any RM failed, the TM instructs all RMs to roll back.

Lucene’s Two‑Phase Commit

Lucene defines a TwoPhaseCommit interface; its document‑write process implements this interface with three main methods.

Phase‑One Commit: prepareCommit

During the first phase, Lucene performs as much update work as possible while avoiding a real commit. If the phase fails, a rollback discards all work done in this stage. Lucene persists the segment to disk but renames the file so the new segment is not yet visible.

synchronized(commitLock) {
    ensureOpen(false);
    if (infoStream.isEnabled("IW")) {
        infoStream.message("IW", "prepareCommit: flush");
        infoStream.message("IW", "  index before flush " + segString());
    }
    if (tragedy != null) {
        throw new IllegalStateException("this writer hit an unrecoverable error; cannot commit", tragedy);
    }
    if (pendingCommit != null) {
        throw new IllegalStateException("prepareCommit was already called with no corresponding call to commit");
    }
    // subsequent phase‑one commit logic
}

Pre‑Commit Validation

Lucene uses a lock to prevent concurrent prepareCommit calls on the same segmentInfo . It checks whether a previous prepareCommit is still in progress by inspecting if the snapshot is non‑null.

Flushing to Create Segments for Commit

Before committing, Lucene performs an additional flush to include as many documents as possible, maximizing durability.

Assembling Commit Information and Updating File Counts

1. User‑provided commit metadata is assembled into a segmentInfo snapshot. 2. A copy of the snapshot is taken to determine which segments will be persisted. 3. Reference counts for new files are initialized.

// assemble user commit data
if (commitUserData != null) {
    Map
userData = new HashMap<>();
    for (Map.Entry
ent : commitUserData) {
        userData.put(ent.getKey(), ent.getValue());
    }
    segmentInfos.setUserData(userData, false);
}
// clone snapshot for commit
toCommit = segmentInfos.clone();
pendingCommitChangeCount = changeCount.get();
filesToCommit = toCommit.files(false);
// increase reference count for each file
deleter.incRef(filesToCommit);

Persisting Segments to Disk

Lucene writes the pending segments to files named pending_segments_N . These are intermediate files; after the second phase they are renamed to segments_N , becoming the final on‑disk representation.

private void write(Directory directory) throws IOException {
    long nextGeneration = getNextPendingGeneration();
    String segmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.PENDING_SEGMENTS, "", nextGeneration);
    // I/O stream and file creation logic
}

Two‑Phase Commit: commit (Phase Two)

Once all pending_segments_N files are safely written, the second phase renames them to segments_N , finalizing the commit.

final String finishCommit(Directory dir) throws IOException {
    // ...
    final String src = IndexFileNames.fileNameFromGeneration(IndexFileNames.PENDING_SEGMENTS, "", generation);
    String dest = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, "", generation);
    dir.rename(src, dest);
    dir.syncMetaData();
    // ...
    return dest;
}

Segment Deletion Policies

Lucene provides several IndexDeletionPolicy implementations:

KeepOnlyLastCommitDeletionPolicy (default): keeps only the most recent commit.

NoDeletionPolicy : retains all segment files, allowing rollback to any previous commit.

SnapshotDeletionPolicy : builds on another policy and keeps an in‑memory snapshot of the latest commit.

PersistentSnapshotDeletionPolicy : like SnapshotDeletionPolicy but persists the snapshot to disk.

/** Deletes all commits except the most recent one. */
@Override
public void onCommit(List
commits) {
    int size = commits.size();
    for (int i = 0; i < size - 1; i++) {
        commits.get(i).delete();
    }
}

Rollback on Failure

If any step in the two‑phase commit fails, Lucene rolls back by terminating pending merges, deleting temporary pending_segment_N files, and restoring the previous segmentInfo snapshot.

List
createBackupSegmentInfos() {
    final List
list = new ArrayList<>(size());
    for (final SegmentCommitInfo info : SegmentInfo) {
        assert info.info.getCodec() != null;
        list.add(info.clone());
    }
    return list;
}

Conclusion

After Lucene’s commit, documents are persisted to disk. Because Lucene lacks a transaction log, it relies on the two‑phase commit to ensure safe rollbacks on failure. The first phase guarantees that all segment files are safely written; the second phase merely renames files, simplifying the code while maintaining a high success rate. Higher‑level systems such as Solr or Elasticsearch add their own transaction logs to further improve reliability.

References

Lucene source code (https://github.com/apache/lucene-solr/tree/branch_7_2)

Chris’s Cabin (https://www.amazingkoala.com.cn/)

What is two‑phase commit in distributed transactions (https://help.aliyun.com/document_detail/132896.html)

IndexingLucenedistributed transactionsRollbackTwo-Phase Commitsegment management
政采云技术
Written by

政采云技术

ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.