Can AI Auditors Match Human Experts? Inside RepoAudit’s LLM‑Powered Code Review
The EXPRESS Workshop at ISSTA 2025, hosted by Ant Group, featured a keynote by Purdue’s Prof. Zhang on an LLM‑driven “Human‑like AI Auditor” called RepoAudit, which demonstrated high‑accuracy automated code review, uncovering dozens of real bugs and hundreds of zero‑day vulnerabilities across major open‑source projects.
The EXPRESS Workshop, organized by Ant Group and held at ISSTA 2025, focused on software system interpretability, reliability, and security, bringing together researchers and industry experts worldwide.
In the keynote “Human‑like AI Auditor for Code Repositories,” Prof. Zhang Xiangyu of Purdue University highlighted the growing security and quality challenges in legacy codebases and AI‑generated code, noting that traditional manual reviews cost $500K–$1.5M per system and take 6–8 months.
His team introduced RepoAudit, an automated auditing system that combines abstraction, pointer tracking, verification, and path‑sensitive reasoning over control‑flow and data‑flow graphs to emulate expert auditors. In a controlled experiment RepoAudit detected 38 real bugs with a 65 % accuracy rate.
Broader field tests on several high‑profile GitHub repositories, including the Linux kernel, uncovered 300 zero‑day vulnerabilities ranging from classic null‑pointer dereferences to complex functional flaws, showcasing the substantial potential of large language models in code auditing.
Following the talk, participants discussed the promise of LLM‑based audit solutions for industry adoption, emphasizing the need to extend detection beyond memory‑related defects to logical and functional bugs.
The workshop also featured research demos on model privacy and intelligent analysis, with Ant Group sharing its latest practices in automated testing and code‑large‑model development, fostering deeper dialogue between academia and industry.
Overall, the EXPRESS Workshop demonstrated cutting‑edge AI applications in software testing and analysis, laying a solid foundation for future research and practical deployment.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
