Machine Learning Practices for Web Attack Detection at Ctrip
This article describes Ctrip’s evolution from rule‑based web attack detection to a Spark‑powered machine‑learning system, detailing the Nile architecture, data collection, feature engineering with TF‑IDF, model training, evaluation metrics, online deployment, and future enhancements for information security.
Author Yue Liang, a senior security engineer at Ctrip, introduces the challenges of traditional rule‑based web attack detection, such as maintenance difficulty, false positives/negatives, and performance impact, and motivates the adoption of machine learning for more accurate and efficient detection.
The original Nile system filtered over 97% of normal traffic using a whitelist before applying regular‑expression rules; the remaining 3% was processed by the rule engine and, if malicious, sent to the Hulk automated vulnerability verification system. The latest version adds a Spark MLlib‑based machine‑learning engine before the rule engine, allowing a two‑stage check that improves throughput and reduces Kafka backlog.
Key benefits of the new architecture include faster preprocessing of traffic, the ability to compare rule‑engine and ML‑engine results to identify gaps, and a mechanism to handle missed detections by feeding new malicious samples back into the system.
Problem definition focuses on a binary classification task—predicting whether a request is malicious or normal—with a target false‑negative rate below 10% and fast inference speed, avoiding algorithms like K‑Nearest Neighbors that are too slow.
Data collection involves extracting labeled black/white traffic from Elasticsearch, combining it with custom WAF alerts and public PoC samples. Initially, features were handcrafted using regular‑expression counts (e.g., occurrences of eval , script , quotes, parentheses), but this approach proved brittle and performance‑heavy.
The team switched to TF‑IDF feature extraction, treating high‑risk characters and keywords as terms. Example code for counting eval occurrences:
def get_evil_eval(url):
return len(re.findall("(eval)", url, re.IGNORECASE))Sample cleaning steps include optimizing existing regex lists, adding dynamic IP blacklists, deduplicating requests, removing encrypted parameters, and using a custom blacklist of suspicious tokens to prune false‑positive white samples.
Model training uses Spark MLlib (or scikit‑learn for local experiments) on a balanced dataset of over 100,000 labeled requests. Hyper‑parameter tuning is performed with GridSearchCV, and evaluation relies on confusion matrices, precision, recall, accuracy, and F1‑score. The reported recall of 0.94 corresponds to a 6% false‑negative rate, which is considered acceptable but still needs improvement.
For online deployment, the trained model is integrated into the Nile pipeline with toggle switches for the ML engine and rule engine, enabling continuous monitoring, automatic rule generation, and periodic retraining with newly labeled data.
Future work includes handling non‑standard JSON/XML payloads, extending the system to multi‑class attack type classification, applying the approach to other domains such as malicious comments or image detection, and migrating from Spark MLlib to the newer Spark ML library.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.