Dynamic Integrated Developer Activity (DIDACT): Large Sequence Models for Software Development

The article introduces DIDACT, a large‑scale multitask machine‑learning framework that trains on the full software‑development workflow—including edits, builds, reviews, and tool interactions—to create AI assistants that can predict and suggest developer actions throughout the coding process.

Continuous Delivery 2.0

Jul 2, 2024

Dynamic Integrated Developer Activity (DIDACT): Large Sequence Models for Software Development

Software is created incrementally through a continuous dialogue among developers, reviewers, and tools such as compilers, test frameworks, and static analysers. Traditional models only learn from the final code state, but DIDACT (Dynamic Integrated Developer ACTivity) treats the entire development process as training data.

DIDACT’s novelty lies in exposing the model to the context developers see while they work and pairing it with the actions they take, enabling the model to learn the dynamics of software development rather than just the end result. Using Google’s extensive internal tooling, the amount and diversity of developer‑activity data were dramatically increased, yielding promising results for both professional developers and as a foundation for general software‑development skills in ML models.

The methodology captures each activity as a state‑intent‑action triple (e.g., a file state, a reviewer comment, and the resulting edit). This compact representation, called DevScript , allows the model to express complex operations without outputting full code snapshots, improving efficiency and interpretability.

DIDACT is trained as a multitask model on activities such as fixing broken builds, predicting code‑review comments, resolving review feedback, renaming variables, and editing files. Three internal tools built on DIDACT—comment parsing, build fixing, and prompt prediction—have been deployed within Google’s workflow and received enthusiastic feedback from thousands of engineers.

Beyond individual tasks, DIDACT demonstrates the potential of a general‑purpose developer‑assistant agent. By prefix‑prompting developer activity and chaining multiple predictions, the model can generate longer, coherent activity trajectories, perform history‑enhanced code completion, and even synthesize entire files from scratch in a natural, step‑by‑step manner.

In conclusion, DIDACT converts Google’s rich software‑engineering process into training demonstrations for an AI developer assistant, advancing the state of large language models toward reducing developer toil, boosting productivity, and improving software‑engineer work quality.