How to Parse Git Diffs into JSON and Highlight Line Changes Efficiently

This article explains how to transform raw Git diff output into a structured JSON format, handle various diff headers, and apply line‑by‑line highlighting using the diff‑match‑patch algorithm, while also offering performance optimizations for long lines.

Taobao Frontend Technology
Taobao Frontend Technology
Taobao Frontend Technology
How to Parse Git Diffs into JSON and Highlight Line Changes Efficiently

Parsing Git Diff

To display a diff, the Git diff format must first be parsed into a structured representation such as JSON. The basic diff header contains the two file names being compared, followed by the hash values and file mode (e.g., 100644 for a text file).

Basic Format

diff --git a/f1 b/f1
index 6f8a38c..449b072 100644
--- a/f1
+++ b/f1
@@ -1,7 +1,7 @@
1
2
3
- a
+ b
5
6
7

The lines starting with --- and +++ indicate the original and new file, while the @@ line defines the hunk header and the context lines that follow.

Extended Headers

After the first header, Git may include additional lines such as old mode, new mode, deleted file mode, copy from, rename from, similarity index, etc. Each of these can be captured with regular expressions and stored in the JSON object.

File Operations

Git diff distinguishes between added, deleted, copied, and renamed files. For example, an added file includes new file mode and a /dev/null entry, while a deleted file shows deleted file mode and the same /dev/null placeholder.

Binary Files

Binary changes are represented simply as Binary files a/img.png and b/img.png differ without showing the actual byte differences.

Line‑by‑Line Diff and Highlighting

After the initial JSON is built, each line can be further diffed to highlight exact changes. Only lines that were modified (neither pure additions nor deletions) are processed with the diff_match_patch.diff_main function.

const diff = diffMatchPatch.diff_main(oldLine.content, newLine.content);
if (diff && diff.length) {
  oldLine.diff = [];
  newLine.diff = [];
  diff.forEach(item => {
    if (item[0] === DiffMatchPatch.DIFF_INSERT) {
      newLine.diff.push({ tag: 'added', content: item[1] });
    } else if (item[0] === DiffMatchPatch.DIFF_DELETE) {
      oldLine.diff.push({ tag: 'deleted', content: item[1] });
    } else {
      oldLine.diff.push({ tag: '', content: item[1] });
      newLine.diff.push({ tag: '', content: item[1] });
    }
  });
}

This produces a diff field for each modified line, indicating inserted or deleted fragments.

Performance Optimizations

Because the classic LCS algorithm is O(M×N), several shortcuts are applied:

Immediate equality check – if the two strings are identical, skip diffing.

Common prefix and suffix detection using binary search (O(log n)).

Detecting pure insertions or deletions without invoking the full algorithm.

Checking for substring containment to avoid unnecessary work.

Skipping line diff for very long lines (e.g., >80 characters) by setting a threshold.

const threshold = 80;
const len = Math.max(oldLine.content.length, newLine.content.length);
if (len <= threshold) {
  const diff = diffMatchPatch.diff_main(oldLine.content, newLine.content);
  // process diff as shown above
}

These heuristics dramatically reduce processing time for large diffs, especially when diffing embedded scripts that can span thousands of characters.

References

Git Diff Documentation

Longest Common Subsequence (LCS)

Neil Fraser’s Diff Algorithm Overview

Optimized Diff Algorithm (PDF)

Google Diff‑Match‑Patch Library

Git diff illustration
Git diff illustration
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Code reviewGitJSONdiff
Taobao Frontend Technology
Written by

Taobao Frontend Technology

The frontend landscape is constantly evolving, with rapid innovations across familiar languages. Like us, your understanding of the frontend is continually refreshed. Join us on Taobao, a vibrant, all‑encompassing platform, to uncover limitless potential.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.