Information Security 8 min read

Automating Web Vulnerability Detection at Ctrip: Architecture and Implementation of the Hulk Project

This article describes Ctrip's automated web vulnerability detection system, detailing the shift from active to passive scanning, the distributed architecture using traffic mirroring, message queues, Redis, and MySQL, and the processes for data collection, de‑duplication, scanning, and vulnerability management.

Ctrip Technology

Jul 6, 2017

Automating Web Vulnerability Detection at Ctrip: Architecture and Implementation of the Hulk Project

Before deployment, web applications must be tested for common vulnerabilities such as SQL injection, XSS, and sensitive data leakage; the article introduces Ctrip's automated security testing approach.

It compares active (black‑box) and passive scanning, highlighting the limitations of active scanners—coverage gaps, inefficiency at large scale—and explains why a passive scanning model was chosen.

The Hulk project implements a distributed real‑time web vulnerability scanner using traffic mirroring and HTTP proxy methods to capture requests, which are sent to RabbitMQ/Kafka queues, de‑duplicated with Redis, and processed by a scan engine that replays requests, applies rule‑based checks, and records findings in MySQL.

Data sources include network traffic mirroring (using DPDK, PF_RING) with TCP reassembly and HTTP parsing, as well as HTTP proxy capture; HTTPS traffic is decrypted when necessary.

During data processing, URLs are normalized (e.g., converting numeric IDs to placeholders) and de‑duplicated via MD5 hashes stored in Redis, with TTL to allow periodic rescanning of previously seen URLs.

The scanning engine employs generic rule checks, sqlmap for SQL injection, and extensible plugins for complex vulnerabilities such as stored XSS or Struts exploits, allowing security operators to write custom detection logic.

To handle authenticated scans, production URLs are mapped to test environments with stored login credentials, enabling safe scanning without affecting live users.

Scalability is achieved by horizontally adding instances of de‑duplication or scanning modules when queue backlogs occur.

Detected vulnerabilities are stored with request/response snapshots for verification, and response bodies can be rendered locally to reproduce the issue.

The system includes a rule‑testing console where operators can validate rules against a controlled vulnerable environment.

After two years in production, the system has identified over 30 high‑severity, 300 medium‑severity, and 400 low‑severity vulnerabilities, continuously improving data pollution handling, scan frequency, de‑duplication logic, and scan types.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed architecture information security web security vulnerability scanning Ctrip passive scanning

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.