Automating Web Vulnerability Detection at Ctrip: Architecture and Implementation of the Hulk Project
This article describes Ctrip's automated web vulnerability detection system, detailing the shift from active to passive scanning, the distributed architecture using traffic mirroring, message queues, Redis, and MySQL, and the processes for data collection, de‑duplication, scanning, and vulnerability management.
Before deployment, web applications must be tested for common vulnerabilities such as SQL injection, XSS, and sensitive data leakage; the article introduces Ctrip's automated security testing approach.
It compares active (black‑box) and passive scanning, highlighting the limitations of active scanners—coverage gaps, inefficiency at large scale—and explains why a passive scanning model was chosen.
The Hulk project implements a distributed real‑time web vulnerability scanner using traffic mirroring and HTTP proxy methods to capture requests, which are sent to RabbitMQ/Kafka queues, de‑duplicated with Redis, and processed by a scan engine that replays requests, applies rule‑based checks, and records findings in MySQL.
Data sources include network traffic mirroring (using DPDK, PF_RING) with TCP reassembly and HTTP parsing, as well as HTTP proxy capture; HTTPS traffic is decrypted when necessary.
During data processing, URLs are normalized (e.g., converting numeric IDs to placeholders) and de‑duplicated via MD5 hashes stored in Redis, with TTL to allow periodic rescanning of previously seen URLs.
The scanning engine employs generic rule checks, sqlmap for SQL injection, and extensible plugins for complex vulnerabilities such as stored XSS or Struts exploits, allowing security operators to write custom detection logic.
To handle authenticated scans, production URLs are mapped to test environments with stored login credentials, enabling safe scanning without affecting live users.
Scalability is achieved by horizontally adding instances of de‑duplication or scanning modules when queue backlogs occur.
Detected vulnerabilities are stored with request/response snapshots for verification, and response bodies can be rendered locally to reproduce the issue.
The system includes a rule‑testing console where operators can validate rules against a controlled vulnerable environment.
After two years in production, the system has identified over 30 high‑severity, 300 medium‑severity, and 400 low‑severity vulnerabilities, continuously improving data pollution handling, scan frequency, de‑duplication logic, and scan types.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.