How to Accurately Determine MergeBase for IDE‑Based Code Review
This article explains how IDE‑based code review tools can compute the correct merge‑base using Git commit data, handle complex branch scenarios, and implement a version‑skip algorithm that reduces review workload while ensuring accurate diff information.
Preface
DEF implemented O2 CodeReview on the pure‑frontend KAITIAN version in FY22 S1. Because the IDE shows code tied to a specific commit, diff is performed via the IDE rather than traditional CR tools, giving flexibility but requiring custom determination of comparison versions.
Automatically skipping already‑reviewed code is crucial for large feature branches; reviewers see only incremental changes based on the last reviewed version. Traditional diff‑based tools use subtraction algorithms, but IDE‑based CR must implement its own logic.
Which Two Versions Should I Compare?
Three‑Way Merge
Before describing our algorithm, we review Git’s three‑way merge strategy. When a branch (e.g.,
fix1) diverges from
main, Git finds the common ancestor ( mergeBase ) to decide how to apply changes.
In CR we also use mergeBase as the base version; if there’s no conflict with
main, the observed changes are exactly those that will be merged.
Obtaining mergeBase
Traditional CR tools compute a diff with
git diff --merge-baseand store it. IDE‑based tools can run
git merge-basedirectly, but in a pure‑frontend scenario this requires backend support, adding implementation cost.
Aone’s platform provides a branch‑compare API that returns a list of commits with
parent_ids. From this list we can reconstruct a commit chain and infer the mergeBase.
<code>[
{
"author_email": "[email protected]",
"author_name": "灰灰",
"committer_email": "[email protected]",
"committer_name": "灰灰",
"created_at": "2020-09-17T18:13:52+08:00",
"id": "4810d0faf6602dac68e447235f7a0e1da31d721e",
"message": "权限申请\n",
"parent_ids": ["05cbd07eae346f6d246b5430b268d6963c8e4c25"],
"short_id": "4810d0fa",
"title": "权限申请"
},
{
"author_email": "[email protected]",
"author_name": "灰灰",
"committer_email": "[email protected]",
"committer_name": "灰灰",
"created_at": "2020-09-21T16:33:32+08:00",
"id": "c33cbf35cea4516659fd40364a1736cc5b4acd09",
"message": "增加日志查看\n",
"parent_ids": ["4810d0faf6602dac68e447235f7a0e1da31d721e"],
"short_id": "c33cbf35",
"title": "增加日志查看"
}
]
</code>The earliest commit’s
parent_idis the mergeBase. When a merge commit has two parents, the one not belonging to the current branch becomes the mergeBase.
If multiple common ancestors exist, we select the shortest path (the first encountered during back‑trace) as the mergeBase.
mergeBase definition: One common ancestor is better than another if the latter is an ancestor of the former. A common ancestor with no better ancestor is the best common ancestor , i.e., the merge base . Multiple merge bases can exist.
Special cases such as a merge node whose both parents belong to the current branch must be ignored when selecting mergeBase.
When the first commit of a new branch is itself a merge node (created via
git merge --no-ff), the
parent_idsarray order determines which parent is the mainline. The second parent (index 1) corresponds to the merged‑in branch and should be used as mergeBase.
<code>error: commit xxx is a merge but no -m option was given
</code>Git’s
-moption selects the parent number for revert operations; similarly, we treat
parent_ids[1]as the mergeBase in such scenarios.
git revert mainline: Usually you cannot revert a merge because you do not know which side of the merge should be considered the mainline. This option specifies the parent number (starting from 1) of the mainline and allows revert to reverse the change relative to the specified parent.
Edge cases like squash merges can cause the true mergeBase to appear earlier in the commit graph, leading to incorrect selection.
Initially, O2 CodeReview’s custom mergeBase algorithm covered most scenarios; later we switched to Git’s native
git merge-basefor full reliability.
Which Code Should Be Skipped?
Two Scenarios
Traditional CR tools skip code by subtracting diffs: revision ~ head = base ~ head ⊖ base ~ revision . In IDE‑based CR we consume full file contents, so we must rethink the skip logic.
Simple case: after reviewing up to commit 1, the user makes two more commits. Comparing commit 1 with the latest commit suffices.
If the base changes (e.g., a merge from
mainoccurs after revision 2), using the old base would re‑introduce merged changes, increasing review load.
Algorithm Implementation
We define the skip algorithm as taking the intersection of
base ~ headand
base ~ revision, then applying the
base ~ revisionchanges to the new base, ensuring that only unseen modifications remain.
By intersecting the two diffs we eliminate changes introduced by the new base while preserving new content, effectively skipping already‑reviewed code.
CR Staging
Previously, DEF’s CR acted as a release gate, causing last‑minute submissions and heavy review loads. The new staged CR process integrates the version‑skip algorithm into daily iterations, enabling phased code review.
Integrating CR into Development Workflow
Next, DEF will embed CR capabilities into IDE plugins (both web and local), leveraging OS features for better navigation and feeding real‑time change information back to the release system, thereby improving review quality and developer productivity.
Conclusion
While native
git merge-basewill eventually replace custom logic, understanding Git’s commit chain and mergeBase remains essential for advanced version‑skip features. Future work includes adding intelligent review assistance powered by machine learning.
Taobao Frontend Technology
The frontend landscape is constantly evolving, with rapid innovations across familiar languages. Like us, your understanding of the frontend is continually refreshed. Join us on Taobao, a vibrant, all‑encompassing platform, to uncover limitless potential.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.