CVE-Factory: Scaling Expert‑Level Security Task Synthesis for Code Agents
The talk introduces CVE-Factory, a framework that automatically converts sparse CVE metadata into high‑quality, executable security tasks for code agents, achieving 95% solution correctness, 96% environment fidelity, and a 66.2% verification rate on real vulnerabilities, while also releasing the LiveCVEBench benchmark and over 1,000 training environments that boost LLM performance dramatically.
Evaluating and improving the security capabilities of code agents requires high‑quality, executable vulnerability tasks, yet existing approaches depend on costly manual reproduction and suffer from outdated data distributions.
To address this, the authors propose CVE-Factory, the first multi‑agent framework that automatically transforms sparse CVE metadata into complete, executable tasks with expert‑level quality.
Cross‑validation against human‑expert reproductions shows CVE-Factory attains 95% solution correctness and 96% environment fidelity; on the latest real‑world vulnerabilities it achieves a 66.2% verification success rate.
The system enables two downstream contributions: (1) LiveCVEBench, a continuously updated benchmark containing 190 tasks covering 14 programming languages and 153 repositories, capturing emerging threats including AI‑toolchain attacks; (2) the synthesis of more than 1,000 executable training environments, enabling large‑scale expansion of code‑security agent tasks.
Fine‑tuning the Qwen3‑32B model on LiveCVEBench raises its success rate from 5.3% to 35.8%, surpassing Claude 4.5 Sonnet, and the model generalizes to the Terminal Bench, improving from 12.5% to 31.3%.
Speaker Luo Xianzhen is a fourth‑year direct‑PhD student at Harbin Institute of Technology’s Social Computing and Interactive Robotics Center, with research focused on code intelligence and Terminal Agents, and has authored five papers in top conferences such as ACL, EMNLP, and ICLR.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
