Applying Automated Feature Engineering and Auto Modeling to Risk Control Scenarios
This article explains how automated feature engineering and auto‑modeling techniques dramatically reduce development time and improve performance in fraud‑risk detection, detailing the underlying RFM concepts, feature generation workflow, model selection, evaluation, deployment, and continuous monitoring within a risk‑control platform.
1. Background and Problem
Model development in risk control traditionally follows a multi‑step pipeline (business analysis, data preparation, feature engineering, model building, evaluation, monitoring). Feature engineering and model construction consume the majority of time—about 60% and 30% respectively—making rapid model delivery difficult.
Rong360 introduced an automated feature‑engineering and auto‑modeling solution that abstracts the most time‑consuming steps into a unified tool, improving efficiency, standardization, and model quality while shortening the end‑to‑end cycle to roughly five days.
2. Automated Feature Engineering
Manual feature engineering relies on domain knowledge and is labor‑intensive. Automated methods leverage the RFM (Recency, Frequency, Monetary) model to generate statistical and trend features from transaction‑level data, and can also construct network‑based features (e.g., number of first‑degree contacts, their borrowing behavior).
The automated pipeline first aggregates basic statistics per variable, then derives ratio and trend features in a second layer, producing a rich feature set with minimal manual effort.
3. Automated Modeling
Popular algorithms such as XGBoost, LightGBM, and Logistic Regression (LR) are integrated into the platform. The tool performs automatic EDA‑based feature filtering (high missing rate, low variance, instability), followed by IV screening, tree‑model importance, and collinearity checks, reducing thousands of features to a few hundred high‑value ones.
The LR section explains odds, probability, and the scoring card formula, emphasizing the critical role of WOE binning, which is also automated (equal‑frequency binning with monotonicity checks) while allowing manual adjustments.
For XGBoost, continuous variables are binned to reduce over‑fitting, and categorical variables are encoded (label, one‑hot, etc.). The platform also provides automated hyper‑parameter tuning via GridSearch and RandomSearch.
4. Automated Model Evaluation and Monitoring
Beyond traditional metrics (AUC, KS), the system evaluates models across dimensions (feature, model), samples (train, test, OOT, early‑performance), and versions (current vs. new). It monitors feature drift, PSI, ranking stability, and triggers alerts when deviations exceed thresholds.
5. Model Deployment and Online Monitoring
Instead of hand‑written Python scripts, models are packaged as configuration files that the Rong360 deployment platform consumes, enabling rapid rollout and automatic scoring. Post‑deployment, the platform continuously compares live metrics (KS, AUC, PSI) against baseline and checks feature binning consistency to detect data drift.
Author Introduction
Jiang Hong, Head of Risk‑Control Business Modeling at Rong360, holds a degree from Shanghai Jiao‑Tong University and has extensive experience in credit modeling, fraud detection, and data mining.
Job Opportunities
Rong360 is hiring senior data algorithm engineers (machine‑learning), senior risk‑control algorithm engineers, and senior data analysts in Beijing. Contact: [email protected].
End of Article
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.