Artificial Intelligence 10 min read

Safely Repairing Broken Builds with Machine Learning

Google's research demonstrates that a machine‑learning model trained on build logs and code snapshots can automatically suggest safe, high‑quality fixes for broken builds, boosting developer productivity by about two percent without introducing detectable security risks.

Continuous Delivery 2.0
Continuous Delivery 2.0
Continuous Delivery 2.0
Safely Repairing Broken Builds with Machine Learning

Automatic repair of non‑building code can improve productivity (measured by overall task completion) and, with high‑quality training data and responsible monitoring, appears not to have any observable negative impact on code safety.

Software development is a cyclic process of design, coding, building, testing, deployment, and debugging. Developers often experience frustration when fixing a build error that introduces many new issues, especially when navigating unfamiliar code.

At Google, every time a source file is saved, a snapshot is stored in version control, and each build run records a log, creating a rich data repository that captures when builds fail, the error messages, and which code changes caused the failure.

The paper introduces the DIDACT ML model, which is trained on "solve sessions"—chronologically ordered code snapshots and build logs that capture the interval from the first failure to the eventual fix. DIDACT learns from the initial broken snapshot, the code state at that point, and the code differences between the broken and fixed snapshots.

Problem and Motivation – The goal is to improve the experience of fixing Java build errors, making error messages more actionable and even automatically repairing them using Google’s extensive development history and research expertise.

Training and Input Data – Solve sessions are organized by grouping code snapshots and build logs from the same workspace, capturing the progression from failure to resolution. DIDACT is fine‑tuned on the first snapshot’s internal build error, the code contents, and the code diff to predict a patch with a confidence score for real‑time IDE suggestions.

Filtering Suggested Fixes for Quality and Safety – Because ML‑generated code can introduce hard‑to‑detect bugs, Google applies post‑processing: automatic formatting followed by heuristic filters designed with expert knowledge and user feedback to avoid common quality and security pitfalls such as code vulnerabilities.

Interface Flow – When a build error occurs, developers see a "View ML‑suggested fix" button in the IDE. Clicking it shows a preview of the suggested patch, which can be accepted or rejected. Accepted fixes are applied and the build proceeds as usual.

AB Experiment – Fifty percent of Google developers were given access to the ML build‑fix feature (treatment group) while the other fifty percent served as a control group. Over 11 weeks, productivity metrics were compared.

Productivity Improvements – The experiment showed statistically significant gains: active coding time per change list (CL) decreased by ~2%, review time per CL decreased by ~2%, and CL throughput increased by ~2% (more CLs submitted per week).

Safety Guarantees – To assess risk, Google monitored security‑related metrics such as CL rollback rate and new sanitizer failure rate. No observable differences were found between CLs written with ML assistance and those without, indicating that the ML‑generated fixes did not degrade code safety.

Conclusion – Presenting ML‑generated build fixes in the IDE, protected by automatic safety checks and human review, significantly improves developer efficiency without compromising security, and the approach may be extensible to other error‑prone tasks throughout the software development lifecycle.

machine learningBuild Automationdeveloper productivitycode safetyML-assisted debugging
Continuous Delivery 2.0
Written by

Continuous Delivery 2.0

Tech and case studies on organizational management, team management, and engineering efficiency

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.