Backend Development 5 min read

Optimizing the Code Diff System: From JGit to GitLab API and Diff Compensation

This article analyzes performance and concurrency problems in a code‑diff service, compares the original JGit‑based approach with a GitLab‑API solution, addresses new accuracy issues, and presents a compensation strategy using java‑diff‑utils to achieve stable, efficient backend diff processing.

转转QA

Sep 23, 2021

Optimizing the Code Diff System: From JGit to GitLab API and Diff Compensation

The code‑diff system is a core component for incremental static code scanning, coverage, and interface analysis, and its stability and performance directly affect the entire workflow. As usage grew, especially with the incremental code statistics feature, the existing design revealed scalability and performance bottlenecks.

Original implementation involved cloning the GitLab repository branch to a local server and using JGit to compute diffs. This caused long clone times for large repositories and required exclusive locks on the local server, preventing concurrent processing of the same repository.

Optimization proposals :

1. Space‑for‑time : continue cloning locally but store code per branch, eliminating cross‑branch locking and reducing logic changes. However, concurrent diffs on the same branch still need locking, and lock contention grows with pull time.

2. Remove JGit : switch to java‑gitlab‑api to call GitLab’s native diff API, which provides stable performance and concurrency. Early deployment met requirements, prompting the question why this approach wasn’t adopted earlier.

New issue : Users reported inaccurate diffs because the GitLab API lacks an option to ignore whitespace changes, whereas JGit can ignore them (e.g., WS_IGNORE_ALL). Business logic requires ignoring pure formatting changes to ensure correct incremental statistics.

Additional solution : Since GitLab’s diff endpoint cannot be configured, a compensation layer was added. The diff result from GitLab is pre‑processed with the open‑source java‑diff‑utils library, and the same HistogramDiff algorithm used by JGit is applied to keep results consistent.

Other problems :

1. Large files sometimes return empty diffs due to GitLab’s diff size limits (see GitLab Diff limits administration).

2. When the number of diff files is large, performance degrades; multithreaded processing was introduced to mitigate this.

Conclusion : After extensive experimentation and several false starts, the GitLab‑API‑based approach with diff compensation now provides a stable and performant foundation for the diff system. Feedback and alternative ideas are welcomed for further improvement.

References :

1，https://github.com/java-diff-utils/java-diff-utils<br/>2，https://docs.gitlab.com/ee/api/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java Performance Optimization code diff gitlab api JGit

Written by

转转QA

In the era of knowledge sharing, discover 转转QA from a new perspective.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.