Two‑Phase Commit in Lucene: Mechanism, Implementation, and Rollback
This article explains the two‑phase commit protocol, describes how Lucene implements it through a dedicated interface, details the preparation, commit, segment handling, deletion policies, and rollback procedures, and provides code snippets illustrating the core logic.
What Is Two‑Phase Commit
The two‑phase commit protocol (2PC) is the core of distributed transactions. A Transaction Manager (TM) coordinates one or more Resource Managers (RMs); each RM reports its status (prepared or failed) to the TM, which then decides to commit or roll back the transaction.
The concrete workflow is:
Application submits a request to the TM, initiating a distributed transaction.
In phase one, the TM asks all RMs to prepare for commit.
Each RM returns a success‑or‑failure message (timeout counts as failure).
Phase two: If all RMs prepared successfully, the TM instructs them to commit. If any RM failed, the TM instructs all RMs to roll back.
Lucene’s Two‑Phase Commit
Lucene defines a TwoPhaseCommit interface; its document‑write process implements this interface with three main methods.
Phase‑One Commit: prepareCommit
During the first phase, Lucene performs as much update work as possible while avoiding a real commit. If the phase fails, a rollback discards all work done in this stage. Lucene persists the segment to disk but renames the file so the new segment is not yet visible.
synchronized(commitLock) {
ensureOpen(false);
if (infoStream.isEnabled("IW")) {
infoStream.message("IW", "prepareCommit: flush");
infoStream.message("IW", " index before flush " + segString());
}
if (tragedy != null) {
throw new IllegalStateException("this writer hit an unrecoverable error; cannot commit", tragedy);
}
if (pendingCommit != null) {
throw new IllegalStateException("prepareCommit was already called with no corresponding call to commit");
}
// subsequent phase‑one commit logic
}Pre‑Commit Validation
Lucene uses a lock to prevent concurrent prepareCommit calls on the same segmentInfo . It checks whether a previous prepareCommit is still in progress by inspecting if the snapshot is non‑null.
Flushing to Create Segments for Commit
Before committing, Lucene performs an additional flush to include as many documents as possible, maximizing durability.
Assembling Commit Information and Updating File Counts
1. User‑provided commit metadata is assembled into a segmentInfo snapshot. 2. A copy of the snapshot is taken to determine which segments will be persisted. 3. Reference counts for new files are initialized.
// assemble user commit data
if (commitUserData != null) {
Map
userData = new HashMap<>();
for (Map.Entry
ent : commitUserData) {
userData.put(ent.getKey(), ent.getValue());
}
segmentInfos.setUserData(userData, false);
}
// clone snapshot for commit
toCommit = segmentInfos.clone();
pendingCommitChangeCount = changeCount.get();
filesToCommit = toCommit.files(false);
// increase reference count for each file
deleter.incRef(filesToCommit);Persisting Segments to Disk
Lucene writes the pending segments to files named pending_segments_N . These are intermediate files; after the second phase they are renamed to segments_N , becoming the final on‑disk representation.
private void write(Directory directory) throws IOException {
long nextGeneration = getNextPendingGeneration();
String segmentFileName = IndexFileNames.fileNameFromGeneration(IndexFileNames.PENDING_SEGMENTS, "", nextGeneration);
// I/O stream and file creation logic
}Two‑Phase Commit: commit (Phase Two)
Once all pending_segments_N files are safely written, the second phase renames them to segments_N , finalizing the commit.
final String finishCommit(Directory dir) throws IOException {
// ...
final String src = IndexFileNames.fileNameFromGeneration(IndexFileNames.PENDING_SEGMENTS, "", generation);
String dest = IndexFileNames.fileNameFromGeneration(IndexFileNames.SEGMENTS, "", generation);
dir.rename(src, dest);
dir.syncMetaData();
// ...
return dest;
}Segment Deletion Policies
Lucene provides several IndexDeletionPolicy implementations:
KeepOnlyLastCommitDeletionPolicy (default): keeps only the most recent commit.
NoDeletionPolicy : retains all segment files, allowing rollback to any previous commit.
SnapshotDeletionPolicy : builds on another policy and keeps an in‑memory snapshot of the latest commit.
PersistentSnapshotDeletionPolicy : like SnapshotDeletionPolicy but persists the snapshot to disk.
/** Deletes all commits except the most recent one. */
@Override
public void onCommit(List
commits) {
int size = commits.size();
for (int i = 0; i < size - 1; i++) {
commits.get(i).delete();
}
}Rollback on Failure
If any step in the two‑phase commit fails, Lucene rolls back by terminating pending merges, deleting temporary pending_segment_N files, and restoring the previous segmentInfo snapshot.
List
createBackupSegmentInfos() {
final List
list = new ArrayList<>(size());
for (final SegmentCommitInfo info : SegmentInfo) {
assert info.info.getCodec() != null;
list.add(info.clone());
}
return list;
}Conclusion
After Lucene’s commit, documents are persisted to disk. Because Lucene lacks a transaction log, it relies on the two‑phase commit to ensure safe rollbacks on failure. The first phase guarantees that all segment files are safely written; the second phase merely renames files, simplifying the code while maintaining a high success rate. Higher‑level systems such as Solr or Elasticsearch add their own transaction logs to further improve reliability.
References
Lucene source code (https://github.com/apache/lucene-solr/tree/branch_7_2)
Chris’s Cabin (https://www.amazingkoala.com.cn/)
What is two‑phase commit in distributed transactions (https://help.aliyun.com/document_detail/132896.html)
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.