How Multi‑Agent AI Transforms SDLC White‑Box Vulnerability Management
An in‑depth exploration of a Multi‑Agent AI system that automates SDLC white‑box vulnerability management, detailing industry‑standard processes, the system’s architecture, specialized agents, prompt engineering, tool integration, and real‑world results that boost audit efficiency and accuracy while enabling true security left‑shift.
Preface
In today’s fast‑moving software development environment, security vulnerability management is undergoing a profound transformation. With the widespread adoption of DevOps, Shift‑Left Security has become an industry consensus, and enterprises are embedding security testing early in the development lifecycle to build true DevSecOps pipelines. Qunar integrates Microsoft’s Software Development Lifecycle (SDLC) with DevOps, shifting security left.
In practice, SAST is the core capability for security testing, bearing the responsibility of discovering vulnerabilities, while IAST and DAST serve as supplementary methods. The quality of SAST directly determines the effectiveness of the entire vulnerability management system. The white‑box scanning stage of the SDLC faces severe challenges: even the best static application security testing tools achieve only 60‑70% accuracy, leading to massive manual effort to confirm true positives. In many internet companies, security teams spend up to 40% of their effort on SDLC tasks.
1. SDLC White‑Box Vulnerability Management Industry Process
Most enterprises follow these steps:
Code commit triggers scan: after developers push code, the CI system automatically invokes SAST tools (e.g., Fortify, Checkmarx, SonarQube, Coverity) for static analysis.
Generate initial vulnerability report: SAST tools output results including vulnerability type, source file, call chain, risk level, in formats such as XML, JSON, PDF, Word, SARIF.
Manual review and confirmation: security engineers examine each finding to verify false positives, confirming external controllability of input points, whitelist filtering, type checks, etc.
Vulnerability classification and assignment: approved findings are entered into a ticket system and routed to the responsible development or business team.
Vulnerability fixing and re‑review: developers fix the issue based on the report, security teams re‑test, and finally close the ticket.
Data consolidation and operational analysis: security teams archive metrics such as vulnerability count, type distribution, MTTR, ownership, and use them to refine rules and strategies.
Process diagram:
Common challenges:
High SAST false‑positive rate → heavy manual review pressure.
Low review efficiency → each vulnerability requires 15‑30 minutes of manual analysis.
Lack of standardization → inconsistent conclusions across engineers.
2. Multi‑Agent Intelligent Vulnerability Review System Design Concept
To address traditional SAST challenges, we introduce a Multi‑Agent architecture that builds an AI‑automated, specialized, and scalable white‑box vulnerability review system. The core idea is to decompose complex review tasks into multiple collaborative agents with specific capabilities, using intelligent scheduling, specialized analysis, and layered re‑verification to achieve precise identification and automated handling.
System diagram:
2.1 Specialized Division of Labor
Different vulnerability types (SQL injection, SSRF, command injection, etc.) involve distinct attack principles, detection methods, and contextual analysis. A single review pipeline cannot cover all details. The Multi‑Agent design creates agents specialized in particular vulnerability types, ensuring each finding receives the most professional analysis, dramatically improving accuracy and depth.
2.2 Automated Collaboration
The system uses an intelligent scheduling agent to automatically identify the vulnerability type and dispatch it to the corresponding specialist agent, reducing manual intervention and greatly increasing processing throughput.
2.3 Extensibility and Flexibility
As new vulnerability types and attack techniques emerge, the system can quickly adapt. The Multi‑Agent architecture allows independent tuning of agents for different vulnerability types without affecting others.
2.4 Quality Assurance and Re‑verification
A summarizing analysis agent aggregates reports from specialist agents, extracts final conclusions, and generates standardized structured data for automated ticket creation, effectively avoiding bias from a single AI judgment.
3. System Architecture and Core Agent Roles
The intelligent review system is orchestrated with n8n and uses DeepSeek‑V3 as the large language model. Core components include:
3.1 Input and Data Acquisition Layer
Supports webhook, workflow triggers, and other entry points.
Calls internal SAST API based on vulnerability hash to fetch detailed information.
SAST platforms provide rich vulnerability data for analysis:
Project Basic Information
Vulnerability Basic Information
SARIF Standard Data
3.2 Task Scheduling Agent
Maps rule names to vulnerability types (e.g., sql-injection → sql_injection_agent).
Dispatches tasks to the corresponding specialist agents.
3.3 Specialist Review Agents
Each agent handles one vulnerability type.
Can invoke SAST MCP and GitLab MCP to obtain call chains, source snippets, and field types.
Outputs a Markdown review report containing analysis process, reasoning, risk description, and remediation suggestions.
3.4 Agent Prompt Writing Guidelines
Prompt = Role + Tool Constraints + Execution Flow + Quality Control + Output Specification.
Key principles:
Keep prompts concise (preferably under 2000 characters) to avoid hallucinations.
Continuously optimize prompts based on AI behavior.
Provide sufficient information via MCP/FunctionCall.
Apply progressive constraints: general rules → specific limits → exception handling.
Require evidence chains and prohibit AI‑generated code without real data.
Standardize error scenarios and response templates.
Prompt Structure Example
Prompt = 角色身份 + 工具约束 + 执行流程 + 质量控制 + 输出规范Example: SQL Injection Review Agent Prompt
Role: You are a SQL injection analysis expert. Input: a vulnerability hash (e.g., "1a223fbee9..."). Task: fully analyze whether the vulnerability truly exists.
Tool Constraints: Use SAST tool only for SARIF‑recorded files; use GitLab tool for any source files not present in SARIF. Prohibit using SAST to fetch unrecorded Java files; if attempted, abort with error "Audit process error: attempted to use SAST for unrecorded file, please use GitLab tool."
Execution Flow:
1. Fetch vulnerability details via SAST (SARIF, source_json, path_json, sink_json).</code><code>2. Extract taint parameters from source_json (e.g., searchBox.period).</code><code>3. Retrieve MyBatis XML SQL content; identify whether concatenation uses ${} or #{}.</code><code>4. If ${} is used, locate whitelist/validation logic via GitLab tool; present code examples.</code><code>5. If no whitelist found, treat as unprotected.Quality Control: Vulnerability is confirmed only when all conditions are met: user‑controlled input, string type, concatenated into SQL with ${}, and no whitelist filtering.
Output Specification: Produce a detailed Markdown report covering basic info, taint parameter, SQL snippet, whitelist evidence, analysis reasoning, impact, and remediation suggestions. Use Chinese Markdown format; avoid JSON/YAML.
4. Key Tools and Technical Implementation
4.1 Data Standardization
Convert SAST output to the industry‑standard SARIF format to ensure consistency and interoperability for downstream agent analysis.
4.2 LLM Capability
DeepSeek‑V3 serves as the core inference engine, supporting vulnerability understanding, contextual analysis, and report generation. Prompt engineering precisely constrains agent behavior.
4.3 Agent Tooling
SAST MCP: queries SARIF data and call chains.
GitLab MCP: accesses repository files to confirm field types and validation logic.
4.4 Prompt Engineering
Each agent has an independent system prompt that defines analysis steps, logical checks, and output format, improving consistency and accuracy.
4.5 Workflow Orchestration
n8n manages agent invocation, information routing, and data parsing, enabling a seamless end‑to‑end pipeline.
5. Effectiveness and Outcomes
After six months of deployment, the AI‑driven review system achieved a 97.36% agreement with manual audit results.
Key achievements:
Review efficiency improved: each vulnerability audit now takes 1‑2 minutes.
White‑box vulnerability operations fully AI‑driven; humans only confirm AI‑missed cases.
AI automatically generates vulnerability descriptions and remediation suggestions, creating tickets for a closed‑loop process.
6. AI‑Driven SDLC White‑Box Vulnerability Management Practice
Leveraging the 97.36% accuracy, we established a differentiated handling strategy: AI‑identified true vulnerabilities trigger automatic ticket creation, while AI‑identified non‑vulnerabilities undergo secondary human confirmation to minimize missed‑bug risk.
Full AI‑driven workflow from development to production is illustrated below.
6.1 Shift from Manual‑Heavy to AI‑Primary Review
AI now serves as the primary reviewer, reducing engineer workload and turning security engineers into reviewers and strategists. The AI‑driven pipeline standardizes tasks as: task scheduling → vulnerability parsing → report generation → aggregated decision.
6.2 Pre‑Production Security Gate Shifted Left
Traditional SAST was often run just before release, causing rushed fixes and “release‑with‑bugs” risk. By moving the SAST trigger to the beta deployment stage, the system automatically scans code, performs AI review, and delivers actionable remediation suggestions to developers on the same day.
6.3 Agent Capability Reuse Across Scenarios
The prompt engineering framework and multi‑agent architecture have been migrated to alert handling and security incident response. Agents now automatically analyze raw alerts, merge duplicates, assess confidence, extract IOCs, and trigger remediation workflows, turning AI into a general security assistant.
Conclusion
The Multi‑Agent white‑box vulnerability intelligent review system built by Qunar addresses the core challenges of SAST—high false‑positive rates, low efficiency, and poor scalability—by deeply integrating AI agents into the security workflow. This shift from manual‑intensive to intelligent automation reduces operational costs, accelerates remediation, and provides reliable security guarantees for rapid agile development.
Future work will focus on further optimizing agent capabilities, expanding coverage to more vulnerability types, and deepening integration with DevOps pipelines to continuously enhance software security.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
