AI-Powered Code Defect Detection: Leveraging Code Knowledge Graphs and Large Language Models
The paper presents an AI‑driven static analysis framework that builds code knowledge graphs to extract relevant slices and leverages large language models for multilingual defect prediction, achieving up to 80% F1, detecting 662 defects across 1,100 C++ modules with a 26.9% recall gain over traditional rule‑based scanners.
This article discusses the application of artificial intelligence to static code analysis and automated defect detection. Traditional static analysis (SA) relies on manually crafted rules for detecting code defects such as null pointer dereferences and array out-of-bounds errors, but this approach suffers from high maintenance costs, weak generalization, and delayed iteration, leading to missed defects.
The authors propose a novel solution using two key technologies: (1) Code Knowledge Graph - to address "what to learn" by extracting code slices related to target variables, reducing the sample size required for machine learning and improving learning accuracy; (2) Large Language Models - to address "how to learn" by using deep learning methods including pre-training and fine-tuning, enabling computers to understand multiple programming languages like humans.
The implementation involves: building a knowledge graph of the code under analysis, identifying target variables, performing dependency analysis based on control flow and data flow, and extracting feature statements. For defect prediction, the authors compared discriminative methods (using BERT-based models achieving 80% F1 score) with generative methods (using models like Bloom, Llama, and Ernie, achieving 61.69% F1 score). Additionally, a rule-based machine learning approach using logistic regression was implemented to filter false positives.
The solution has been deployed in production, covering 1100+ C++ modules and identifying 662 defects, with a 26.9% recall improvement over traditional rule-based scanning. The research was published at IEEE AITest Conference 2023.
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.