How AI Transforms GitLab Merge Request Code Reviews: Architecture & Lessons Learned
This article details the design and implementation of an AI‑powered automated code‑review system for GitLab Merge Requests, covering background problems, layered architecture, diff parsing, prompt engineering, comment management, rate‑limiting, concurrency control, and the measurable improvements achieved.
1. Tool Overview
In everyday software development, code review is essential for quality and risk mitigation, but manual reviews suffer from efficiency bottlenecks, inconsistent standards, and missed defects. To address these issues, we built an automated code‑review system that triggers on GitLab Merge Request (MR) events, parses diffs, invokes a large language model (LLM) for analysis, and writes results back to the MR comments.
1.1 Background and Problem
Efficiency bottleneck: senior developers spend excessive time on trivial issues; junior developers cause longer review cycles.
Inconsistent standards: different reviewers apply subjective criteria, reducing team efficiency.
Missed defects: fatigue leads to overlooked security or logic bugs.
The system aims to automate repetitive, rule‑based checks while preserving human oversight for complex judgments.
2. Overall Architecture Design
2.1 System Capabilities
Listen to GitLab MR webhook events and trigger reviews automatically.
Parse the diff and build accurate line‑number mappings.
Call an LLM to identify code defects, security issues, and performance risks.
Create and clean up MR comments programmatically.
Apply rate‑limiting, concurrency control, and retry mechanisms for stability.
Record comprehensive logs for troubleshooting.
2.2 Layered Design
The system follows a four‑layer architecture:
Access Layer : Receives GitLab webhook requests, validates them, extracts parameters, and filters events.
Business Logic Layer : Orchestrates the full review workflow—event handling, diff parsing, AI analysis, and comment publishing.
Service Layer : Encapsulates core capabilities such as diff parsing, AI review, and GitLab comment management.
External Integration Layer : Interfaces with GitLab API and LLM API to fetch MR data, post comments, and invoke the model.
3. Core Module Design
3.1 Webhook Event Handling and Request Validation
The webhook serves as the entry point. It performs two main tasks: (1) identify and filter relevant events (only merge_request events are processed) and (2) hand off the review work to an asynchronous backend. Because GitLab imposes a 10‑second timeout on webhook responses, the synchronous part returns quickly after queuing the task.
3.2 Diff Parsing and Line‑Number Mapping
Accurate line mapping is critical; the LLM can understand code semantics but cannot reliably locate comments without explicit line numbers. We therefore standardize the diff:
Split the diff into hunks using the @@ -old_start,old_count +new_start,new_count @@ header.
Scan each line, tracking whether it is an addition ( +), deletion ( -), or context (space) to advance the appropriate line counters.
Emit a normalized diff where every line carries both old and new line numbers, enabling deterministic comment placement.
3.2.1 Prompt Design for Line Extraction
We constrain the prompt so the model can only select line numbers from the structured diff. The rules enforce that only newly added lines ( +) are considered and that the startLine and endLine values must come from the second number in the ( , newLine) tuple.
**Diff format example**
@@ -1,16 +1,13 @@
( , 15) + const newVar = "add"; <-- focus on new line 15
(16, 16) funcCall(newVar); <-- context line
**Absolute line‑extraction rules (Crucial):**
- Only consider '+' (added) lines.
- Output startLine and endLine must be the second number in parentheses.
- Example: "( , 15) + code" → startLine = 15.3.3 Review Result Convergence
After fixing line‑number issues, the next challenge is the quality of the review output. Unconstrained LLM responses tend to produce many style suggestions and low‑value comments. To improve signal‑to‑noise ratio, we introduced a rule‑based filter in the prompt that limits the model to report only high‑impact problems such as logic bugs, security vulnerabilities, performance issues, and architectural flaws, with severity levels defined as:
- High: compile errors, runtime crashes, security bugs, memory/resource leaks, data corruption.
- Medium: performance regressions, type‑unsafety, missing error handling, framework anti‑patterns.
- Low: style, naming, readability suggestions.This shift reduced the number of comments while increasing the relevance of each comment.
3.4 Code‑Line Comment Management
After the LLM returns findings, the service constructs a GitLab Position object containing project ID, MR IID, file path, commit SHA, and the new line number, then posts each comment asynchronously. Invalid results (missing description, line, or illegal path) are filtered out.
// Example of asynchronous comment creation
List<ReviewNoteTask> validTasks = filterValidTasks(...);
CountDownLatch latch = new CountDownLatch(validTasks.size());
for (ReviewNoteTask task : validTasks) {
noteExecutor.submit(() -> {
try {
discussionsApi.createMergeRequestDiscussion(
task.projectId,
task.mergeRequestIid,
task.commentContent,
new Date(),
null,
task.position);
} catch (Exception e) {
log.error("Comment creation failed: projectId={}, iid={}, newPath={}",
task.projectId, task.mergeRequestIid, task.getNewPath(), e);
} finally {
latch.countDown();
}
});
}
latch.await();3.5 Rate Limiting, Concurrency Control, and Retry
When many MRs trigger simultaneously, model calls can exceed the LLM provider’s rate limits, causing bulk failures. We mitigated this with three measures:
Global token‑bucket rate limiter (Guava RateLimiter) set to 60 requests per second.
Separate thread pools for the main MR workflow and for comment posting, preventing one slow stage from blocking the other.
Custom Spring Retry configuration with a reduced max attempt count (3) and a conservative exponential back‑off (initial 60 s, multiplier 2, max 180 s) to avoid long‑running retries that would exhaust thread resources.
// Rate limiter configuration
@Configuration
public class RateLimiterConfig {
@Bean
public RateLimiter rateLimiter() {
// 60 permits per second
return RateLimiter.create(60.0);
}
} // Custom retry template
public RetryTemplate retryTemplate() {
RetryTemplate template = new RetryTemplate();
SimpleRetryPolicy policy = new SimpleRetryPolicy(3);
template.setRetryPolicy(policy);
ExponentialBackOffPolicy backOff = new ExponentialBackOffPolicy();
backOff.setInitialInterval(60000); // 60 s
backOff.setMultiplier(2.0);
backOff.setMaxInterval(180000); // 180 s
template.setBackOffPolicy(backOff);
return template;
}These safeguards dramatically improved stability under high load, preventing cascade failures and keeping the review pipeline responsive.
4. Summary
The AI‑driven code‑review system moves repetitive, rule‑based inspection from manual effort to an automated pipeline. The real engineering challenges were not the model itself but integrating it into a production workflow: handling webhook timeouts, aligning diff line numbers with GitLab comment positions, constraining LLM output, and ensuring robustness under concurrency and rate‑limit pressure. After the optimizations, the system reliably surfaces high‑impact defects while keeping comment volume low, demonstrating that LLMs can add tangible value to software engineering when coupled with disciplined engineering practices.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
