4 min read

CVE-Factory: Scaling Expert‑Level Security Task Synthesis for Code Agents

The talk introduces CVE-Factory, a framework that automatically converts sparse CVE metadata into high‑quality, executable security tasks for code agents, achieving 95% solution correctness, 96% environment fidelity, and a 66.2% verification rate on real vulnerabilities, while also releasing the LiveCVEBench benchmark and over 1,000 training environments that boost LLM performance dramatically.

Machine Learning Algorithms & Natural Language Processing

Feb 13, 2026

CVE-Factory: Scaling Expert‑Level Security Task Synthesis for Code Agents

Evaluating and improving the security capabilities of code agents requires high‑quality, executable vulnerability tasks, yet existing approaches depend on costly manual reproduction and suffer from outdated data distributions.

To address this, the authors propose CVE-Factory, the first multi‑agent framework that automatically transforms sparse CVE metadata into complete, executable tasks with expert‑level quality.

Cross‑validation against human‑expert reproductions shows CVE-Factory attains 95% solution correctness and 96% environment fidelity; on the latest real‑world vulnerabilities it achieves a 66.2% verification success rate.

The system enables two downstream contributions: (1) LiveCVEBench, a continuously updated benchmark containing 190 tasks covering 14 programming languages and 153 repositories, capturing emerging threats including AI‑toolchain attacks; (2) the synthesis of more than 1,000 executable training environments, enabling large‑scale expansion of code‑security agent tasks.

Fine‑tuning the Qwen3‑32B model on LiveCVEBench raises its success rate from 5.3% to 35.8%, surpassing Claude 4.5 Sonnet, and the model generalizes to the Terminal Bench, improving from 12.5% to 31.3%.

Speaker Luo Xianzhen is a fourth‑year direct‑PhD student at Harbin Institute of Technology’s Social Computing and Interactive Robotics Center, with research focused on code intelligence and Terminal Agents, and has authored five papers in top conferences such as ACL, EMNLP, and ICLR.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI safety code agents CVE-Factory LiveCVEBench security tasks

Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.