DeepDebug: Transformer‑Based Automatic Debugging Using Large Pretrained Models

The paper presents DeepDebug, a transformer‑based system that leverages large pretrained models and extensive synthetic and real‑world data to automatically localize and fix bugs in Python code, achieving significant improvements in patch generation success rates and reduction of false positives on benchmarks such as QuixBugs.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
DeepDebug: Transformer‑Based Automatic Debugging Using Large Pretrained Models

DeepDebug is a novel automatic debugging approach introduced by researchers from Microsoft Cloud+AI that employs a large‑scale pretrained transformer model to generate bug‑fixing patches for Python programs. The method first trains a reverse‑translation model on 200,000 libraries, then fine‑tunes it with richly annotated buggy functions from 10,000 libraries, incorporating stack traces, prints, and other contextual information.

The authors construct four training datasets: raw Python code for pre‑training, commit data for training bug‑creation and bug‑fix models, synthetically buggy methods extracted from original code, and functions with executable tests. Synthetic bugs are generated both by heuristic edits (e.g., replacing dot access with brackets) and by neural bug‑creation models trained on reversed commit data, dramatically expanding the training corpus.

Model architecture reuses a 12‑layer encoder/decoder transformer (≈4.06 B parameters) originally from Facebook’s BART, extending the positional embedding matrix with axial embeddings to accommodate up to 1024 tokens for code context and 896 tokens for stack traces. Pre‑training uses span‑masking on 200 k public Python libraries.

Experiments on the QuixBugs benchmark and a large internal test suite show that DeepDebug reduces bug‑patch loss by over 10 % compared with forward‑only commit data, improves the number of fixed bugs by more than 50 %, lowers the false‑positive rate from 35 % to 5 %, and cuts timeout from six hours to one minute. Adding full code frames further decreases loss, and incorporating Pytest stack traces boosts the top‑10 patch success rate from 90 % to 97 %.

The study demonstrates that transformer‑based neural models, when enriched with execution traces and extensive synthetic data, can substantially advance automatic program repair, outperforming prior techniques on both accuracy and efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

machine learningTransformersoftware engineeringautomatic debuggingprogram repair
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.