How Multi‑Agent AI Transforms SDLC White‑Box Vulnerability Management

An in‑depth exploration of a Multi‑Agent AI system that automates SDLC white‑box vulnerability management, detailing industry‑standard processes, the system’s architecture, specialized agents, prompt engineering, tool integration, and real‑world results that boost audit efficiency and accuracy while enabling true security left‑shift.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
How Multi‑Agent AI Transforms SDLC White‑Box Vulnerability Management

Preface

In today’s fast‑moving software development environment, security vulnerability management is undergoing a profound transformation. With the widespread adoption of DevOps, Shift‑Left Security has become an industry consensus, and enterprises are embedding security testing early in the development lifecycle to build true DevSecOps pipelines. Qunar integrates Microsoft’s Software Development Lifecycle (SDLC) with DevOps, shifting security left.

In practice, SAST is the core capability for security testing, bearing the responsibility of discovering vulnerabilities, while IAST and DAST serve as supplementary methods. The quality of SAST directly determines the effectiveness of the entire vulnerability management system. The white‑box scanning stage of the SDLC faces severe challenges: even the best static application security testing tools achieve only 60‑70% accuracy, leading to massive manual effort to confirm true positives. In many internet companies, security teams spend up to 40% of their effort on SDLC tasks.

1. SDLC White‑Box Vulnerability Management Industry Process

Most enterprises follow these steps:

Code commit triggers scan: after developers push code, the CI system automatically invokes SAST tools (e.g., Fortify, Checkmarx, SonarQube, Coverity) for static analysis.

Generate initial vulnerability report: SAST tools output results including vulnerability type, source file, call chain, risk level, in formats such as XML, JSON, PDF, Word, SARIF.

Manual review and confirmation: security engineers examine each finding to verify false positives, confirming external controllability of input points, whitelist filtering, type checks, etc.

Vulnerability classification and assignment: approved findings are entered into a ticket system and routed to the responsible development or business team.

Vulnerability fixing and re‑review: developers fix the issue based on the report, security teams re‑test, and finally close the ticket.

Data consolidation and operational analysis: security teams archive metrics such as vulnerability count, type distribution, MTTR, ownership, and use them to refine rules and strategies.

Process diagram:

SDLC white‑box vulnerability management process diagram
SDLC white‑box vulnerability management process diagram

Common challenges:

High SAST false‑positive rate → heavy manual review pressure.

Low review efficiency → each vulnerability requires 15‑30 minutes of manual analysis.

Lack of standardization → inconsistent conclusions across engineers.

2. Multi‑Agent Intelligent Vulnerability Review System Design Concept

To address traditional SAST challenges, we introduce a Multi‑Agent architecture that builds an AI‑automated, specialized, and scalable white‑box vulnerability review system. The core idea is to decompose complex review tasks into multiple collaborative agents with specific capabilities, using intelligent scheduling, specialized analysis, and layered re‑verification to achieve precise identification and automated handling.

System diagram:

Multi‑Agent system architecture diagram
Multi‑Agent system architecture diagram

2.1 Specialized Division of Labor

Different vulnerability types (SQL injection, SSRF, command injection, etc.) involve distinct attack principles, detection methods, and contextual analysis. A single review pipeline cannot cover all details. The Multi‑Agent design creates agents specialized in particular vulnerability types, ensuring each finding receives the most professional analysis, dramatically improving accuracy and depth.

Specialized agents illustration
Specialized agents illustration

2.2 Automated Collaboration

The system uses an intelligent scheduling agent to automatically identify the vulnerability type and dispatch it to the corresponding specialist agent, reducing manual intervention and greatly increasing processing throughput.

Scheduling and collaboration diagram
Scheduling and collaboration diagram

2.3 Extensibility and Flexibility

As new vulnerability types and attack techniques emerge, the system can quickly adapt. The Multi‑Agent architecture allows independent tuning of agents for different vulnerability types without affecting others.

Extensibility illustration
Extensibility illustration

2.4 Quality Assurance and Re‑verification

A summarizing analysis agent aggregates reports from specialist agents, extracts final conclusions, and generates standardized structured data for automated ticket creation, effectively avoiding bias from a single AI judgment.

Quality assurance agent diagram
Quality assurance agent diagram

3. System Architecture and Core Agent Roles

The intelligent review system is orchestrated with n8n and uses DeepSeek‑V3 as the large language model. Core components include:

3.1 Input and Data Acquisition Layer

Supports webhook, workflow triggers, and other entry points.

Calls internal SAST API based on vulnerability hash to fetch detailed information.

SAST platforms provide rich vulnerability data for analysis:

Project Basic Information

Project basic info
Project basic info

Vulnerability Basic Information

Vulnerability basic info
Vulnerability basic info

SARIF Standard Data

SARIF data
SARIF data

3.2 Task Scheduling Agent

Maps rule names to vulnerability types (e.g., sql-injectionsql_injection_agent).

Dispatches tasks to the corresponding specialist agents.

3.3 Specialist Review Agents

Each agent handles one vulnerability type.

Can invoke SAST MCP and GitLab MCP to obtain call chains, source snippets, and field types.

Outputs a Markdown review report containing analysis process, reasoning, risk description, and remediation suggestions.

3.4 Agent Prompt Writing Guidelines

Prompt = Role + Tool Constraints + Execution Flow + Quality Control + Output Specification.

Key principles:

Keep prompts concise (preferably under 2000 characters) to avoid hallucinations.

Continuously optimize prompts based on AI behavior.

Provide sufficient information via MCP/FunctionCall.

Apply progressive constraints: general rules → specific limits → exception handling.

Require evidence chains and prohibit AI‑generated code without real data.

Standardize error scenarios and response templates.

Prompt Structure Example

Prompt = 角色身份 + 工具约束 + 执行流程 + 质量控制 + 输出规范

Example: SQL Injection Review Agent Prompt

Role: You are a SQL injection analysis expert. Input: a vulnerability hash (e.g., "1a223fbee9..."). Task: fully analyze whether the vulnerability truly exists.

Tool Constraints: Use SAST tool only for SARIF‑recorded files; use GitLab tool for any source files not present in SARIF. Prohibit using SAST to fetch unrecorded Java files; if attempted, abort with error "Audit process error: attempted to use SAST for unrecorded file, please use GitLab tool."

Execution Flow:

1. Fetch vulnerability details via SAST (SARIF, source_json, path_json, sink_json).</code><code>2. Extract taint parameters from source_json (e.g., searchBox.period).</code><code>3. Retrieve MyBatis XML SQL content; identify whether concatenation uses ${} or #{}.</code><code>4. If ${} is used, locate whitelist/validation logic via GitLab tool; present code examples.</code><code>5. If no whitelist found, treat as unprotected.

Quality Control: Vulnerability is confirmed only when all conditions are met: user‑controlled input, string type, concatenated into SQL with ${}, and no whitelist filtering.

Output Specification: Produce a detailed Markdown report covering basic info, taint parameter, SQL snippet, whitelist evidence, analysis reasoning, impact, and remediation suggestions. Use Chinese Markdown format; avoid JSON/YAML.

4. Key Tools and Technical Implementation

4.1 Data Standardization

Convert SAST output to the industry‑standard SARIF format to ensure consistency and interoperability for downstream agent analysis.

4.2 LLM Capability

DeepSeek‑V3 serves as the core inference engine, supporting vulnerability understanding, contextual analysis, and report generation. Prompt engineering precisely constrains agent behavior.

4.3 Agent Tooling

SAST MCP: queries SARIF data and call chains.

GitLab MCP: accesses repository files to confirm field types and validation logic.

4.4 Prompt Engineering

Each agent has an independent system prompt that defines analysis steps, logical checks, and output format, improving consistency and accuracy.

4.5 Workflow Orchestration

n8n manages agent invocation, information routing, and data parsing, enabling a seamless end‑to‑end pipeline.

5. Effectiveness and Outcomes

After six months of deployment, the AI‑driven review system achieved a 97.36% agreement with manual audit results.

Accuracy comparison chart
Accuracy comparison chart

Key achievements:

Review efficiency improved: each vulnerability audit now takes 1‑2 minutes.

White‑box vulnerability operations fully AI‑driven; humans only confirm AI‑missed cases.

AI automatically generates vulnerability descriptions and remediation suggestions, creating tickets for a closed‑loop process.

6. AI‑Driven SDLC White‑Box Vulnerability Management Practice

Leveraging the 97.36% accuracy, we established a differentiated handling strategy: AI‑identified true vulnerabilities trigger automatic ticket creation, while AI‑identified non‑vulnerabilities undergo secondary human confirmation to minimize missed‑bug risk.

Full AI‑driven workflow from development to production is illustrated below.

End‑to‑end AI workflow diagram
End‑to‑end AI workflow diagram

6.1 Shift from Manual‑Heavy to AI‑Primary Review

AI now serves as the primary reviewer, reducing engineer workload and turning security engineers into reviewers and strategists. The AI‑driven pipeline standardizes tasks as: task scheduling → vulnerability parsing → report generation → aggregated decision.

6.2 Pre‑Production Security Gate Shifted Left

Traditional SAST was often run just before release, causing rushed fixes and “release‑with‑bugs” risk. By moving the SAST trigger to the beta deployment stage, the system automatically scans code, performs AI review, and delivers actionable remediation suggestions to developers on the same day.

6.3 Agent Capability Reuse Across Scenarios

The prompt engineering framework and multi‑agent architecture have been migrated to alert handling and security incident response. Agents now automatically analyze raw alerts, merge duplicates, assess confidence, extract IOCs, and trigger remediation workflows, turning AI into a general security assistant.

Conclusion

The Multi‑Agent white‑box vulnerability intelligent review system built by Qunar addresses the core challenges of SAST—high false‑positive rates, low efficiency, and poor scalability—by deeply integrating AI agents into the security workflow. This shift from manual‑intensive to intelligent automation reduces operational costs, accelerates remediation, and provides reliable security guarantees for rapid agile development.

Future work will focus on further optimizing agent capabilities, expanding coverage to more vulnerability types, and deepening integration with DevOps pipelines to continuously enhance software security.

AIVulnerability Managementsecurity automationSDLC
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.