Information Security 9 min read

Design and Implementation of a Cloud‑Based Web Application Firewall at Ctrip

This article describes Ctrip's challenges with web security, evaluates hardware and commercial cloud WAF shortcomings, and presents a low‑cost, low‑risk cloud‑based WAF solution that leverages DNS redirection, closed‑loop rule management, Lua/Tengine deployment, supervised machine‑learning log analysis, and big‑data streaming for real‑time threat detection and mitigation.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Design and Implementation of a Cloud‑Based Web Application Firewall at Ctrip

Ctrip, the largest domestic OTA group, prioritizes both security and stability of its services, facing issues such as malicious IP blocking, scan detection, OWASP Top 10 attacks (SQL injection, XSS, LFI, command execution, information leakage), and emergency responses to critical vulnerabilities like Struts2 and Shellshock.

Traditional hardware WAFs are costly, slow to deploy and update, and unsuitable for multi‑IDC environments, while commercial cloud WAFs lack full customization and cannot be adopted when services cannot be fully migrated to the cloud, leading to incompatibility across diverse OS, containers, and programming languages.

The chosen solution is a cloud‑based Web Application Firewall that is backend‑agnostic, requiring only a DNS change for deployment, thus bypassing many of the aforementioned limitations.

Key characteristics include a centralized platform with no client modifications, DNS‑based routing, and shared detection information across services.

The development follows a closed‑loop design: rule sources (external feeds and internal collections) are dynamically loaded into a Storm‑based real‑time traffic processing framework for detection‑only testing; alarm logs are analyzed offline to identify false‑positives/negatives, refined rules are fed back into the system, and continuous optimization occurs during operations.

Deployment utilizes Tengine and LuaJIT with configurable loading/unloading of the WAF module; core high‑speed detection logic is written in Lua, managed via a RESTful API, and leverages Tengine as the soft‑load‑balancing foundation.

Log handling incorporates supervised machine learning to judge false positives: logs are collected, features are optimized, classifiers are trained offline, and the resulting model classifies future logs, feeding back mis‑detections for further training.

Observations show that false‑positive logs often have distinct characteristics, such as larger time intervals between requests from the same IP; these patterns are used to build training datasets, and classified logs are presented on a management console for manual verification.

Architecturally, incoming billions of requests pass through an SLB load balancer, then Nginx loads the WAF module which evaluates each request against rule sets (intercept, redirect, etc.) before forwarding legitimate traffic to backend servers.

Integration with existing SLB allows both HTTP and HTTPS applications to be protected without extra SSL configuration; a reserved REST API enables rule publishing, activation, mode switching, and bypass control, while WAF and interception logs are streamed to a Kafka cluster for downstream analysis.

Logs are consumed from Kafka, stored in Elasticsearch, and visualized in Kibana with dimensions such as geography, IP, interception reason, and requested domain.

3D visualizations display real‑time attack source maps and WAF module status, showing request rates and average latency per 10‑second interval.

Beyond the WAF module, traffic is mirrored via switches to RabbitMQ and processed by Storm, which not only detects attacks but also aggregates multi‑dimensional metrics to generate dynamic blocking rules that are pushed to the WAF via its REST interface.

Both dynamic and static rule deployment are supported through the REST API, allowing one‑click enable/disable, mode switching, and bypass operations for rapid response during emergencies.

Log analytics employ Spark Streaming and MLlib; logs are parsed, features extracted, and an SVM classifier identifies false positives, with manual confirmations feeding back into the training pipeline.

Future work focuses on improving efficiency, adding functionalities, and achieving a better balance between performance and detection accuracy, aiming for higher throughput while maintaining low false‑positive rates.

Welcome to follow us.

Big Datamachine learningdistributed architectureweb securityLog AnalysisCloud Securitywaf
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.