Fundamentals 17 min read

How to Accurately Determine MergeBase for IDE‑Based Code Review

This article explains how IDE‑based code review tools can compute the correct merge‑base using Git commit data, handle complex branch scenarios, and implement a version‑skip algorithm that reduces review workload while ensuring accurate diff information.

Taobao Frontend Technology
Taobao Frontend Technology
Taobao Frontend Technology
How to Accurately Determine MergeBase for IDE‑Based Code Review

Preface

DEF implemented O2 CodeReview on the pure‑frontend KAITIAN version in FY22 S1. Because the IDE shows code tied to a specific commit, diff is performed via the IDE rather than traditional CR tools, giving flexibility but requiring custom determination of comparison versions.

Automatically skipping already‑reviewed code is crucial for large feature branches; reviewers see only incremental changes based on the last reviewed version. Traditional diff‑based tools use subtraction algorithms, but IDE‑based CR must implement its own logic.

Which Two Versions Should I Compare?

Three‑Way Merge

Before describing our algorithm, we review Git’s three‑way merge strategy. When a branch (e.g.,

fix1

) diverges from

main

, Git finds the common ancestor ( mergeBase ) to decide how to apply changes.

In CR we also use mergeBase as the base version; if there’s no conflict with

main

, the observed changes are exactly those that will be merged.

Obtaining mergeBase

Traditional CR tools compute a diff with

git diff --merge-base

and store it. IDE‑based tools can run

git merge-base

directly, but in a pure‑frontend scenario this requires backend support, adding implementation cost.

Aone’s platform provides a branch‑compare API that returns a list of commits with

parent_ids

. From this list we can reconstruct a commit chain and infer the mergeBase.

<code>[
  {
    "author_email": "[email protected]",
    "author_name": "灰灰",
    "committer_email": "[email protected]",
    "committer_name": "灰灰",
    "created_at": "2020-09-17T18:13:52+08:00",
    "id": "4810d0faf6602dac68e447235f7a0e1da31d721e",
    "message": "权限申请\n",
    "parent_ids": ["05cbd07eae346f6d246b5430b268d6963c8e4c25"],
    "short_id": "4810d0fa",
    "title": "权限申请"
  },
  {
    "author_email": "[email protected]",
    "author_name": "灰灰",
    "committer_email": "[email protected]",
    "committer_name": "灰灰",
    "created_at": "2020-09-21T16:33:32+08:00",
    "id": "c33cbf35cea4516659fd40364a1736cc5b4acd09",
    "message": "增加日志查看\n",
    "parent_ids": ["4810d0faf6602dac68e447235f7a0e1da31d721e"],
    "short_id": "c33cbf35",
    "title": "增加日志查看"
  }
]
</code>

The earliest commit’s

parent_id

is the mergeBase. When a merge commit has two parents, the one not belonging to the current branch becomes the mergeBase.

If multiple common ancestors exist, we select the shortest path (the first encountered during back‑trace) as the mergeBase.

mergeBase definition: One common ancestor is better than another if the latter is an ancestor of the former. A common ancestor with no better ancestor is the best common ancestor , i.e., the merge base . Multiple merge bases can exist.

Special cases such as a merge node whose both parents belong to the current branch must be ignored when selecting mergeBase.

When the first commit of a new branch is itself a merge node (created via

git merge --no-ff

), the

parent_ids

array order determines which parent is the mainline. The second parent (index 1) corresponds to the merged‑in branch and should be used as mergeBase.

<code>error: commit xxx is a merge but no -m option was given
</code>

Git’s

-m

option selects the parent number for revert operations; similarly, we treat

parent_ids[1]

as the mergeBase in such scenarios.

git revert mainline: Usually you cannot revert a merge because you do not know which side of the merge should be considered the mainline. This option specifies the parent number (starting from 1) of the mainline and allows revert to reverse the change relative to the specified parent.

Edge cases like squash merges can cause the true mergeBase to appear earlier in the commit graph, leading to incorrect selection.

Initially, O2 CodeReview’s custom mergeBase algorithm covered most scenarios; later we switched to Git’s native

git merge-base

for full reliability.

Which Code Should Be Skipped?

Two Scenarios

Traditional CR tools skip code by subtracting diffs: revision ~ head = base ~ head ⊖ base ~ revision . In IDE‑based CR we consume full file contents, so we must rethink the skip logic.

Simple case: after reviewing up to commit 1, the user makes two more commits. Comparing commit 1 with the latest commit suffices.

If the base changes (e.g., a merge from

main

occurs after revision 2), using the old base would re‑introduce merged changes, increasing review load.

Algorithm Implementation

We define the skip algorithm as taking the intersection of

base ~ head

and

base ~ revision

, then applying the

base ~ revision

changes to the new base, ensuring that only unseen modifications remain.

By intersecting the two diffs we eliminate changes introduced by the new base while preserving new content, effectively skipping already‑reviewed code.

CR Staging

Previously, DEF’s CR acted as a release gate, causing last‑minute submissions and heavy review loads. The new staged CR process integrates the version‑skip algorithm into daily iterations, enabling phased code review.

Integrating CR into Development Workflow

Next, DEF will embed CR capabilities into IDE plugins (both web and local), leveraging OS features for better navigation and feeding real‑time change information back to the release system, thereby improving review quality and developer productivity.

Conclusion

While native

git merge-base

will eventually replace custom logic, understanding Git’s commit chain and mergeBase remains essential for advanced version‑skip features. Future work includes adding intelligent review assistance powered by machine learning.

IDE integrationcode reviewgitversion controlDiff Algorithmmerge base
Taobao Frontend Technology
Written by

Taobao Frontend Technology

The frontend landscape is constantly evolving, with rapid innovations across familiar languages. Like us, your understanding of the frontend is continually refreshed. Join us on Taobao, a vibrant, all‑encompassing platform, to uncover limitless potential.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.