Marketing Anti‑Fraud Algorithm Framework and Practice at 58.com
This article details the design, implementation, and evaluation of a multi‑layer anti‑fraud system for 58.com’s marketing activities, covering data and feature engineering, unsupervised and supervised models, graph‑based community detection, and semi‑supervised graph neural networks, with empirical results demonstrating their effectiveness.
Background 58.com spends billions on various marketing campaigns such as user acquisition, activation, and promotion. Black‑market actors exploit these incentives by fabricating devices, accounts, or transactions, leading to financial loss and degraded user experience.
Anti‑Fraud Framework Design The overall framework consists of three layers: a data/feature layer that aggregates user, device, IP, and behavior attributes; a model layer that includes unsupervised anomaly detection, semi‑supervised graph neural networks, and supervised tree‑based models; and a service layer that provides offline model + whitelist services and real‑time model APIs for downstream business lines.
Feature Construction Features are categorized into attribute (e.g., registration, authentication, device tags), action (e.g., activity frequency, time‑slot distribution, group similarity), and relation (e.g., shared devices, IPs, community size). These features feed both traditional tree models and graph‑based algorithms.
Unsupervised Anomaly Detection Isolation Forest is the primary algorithm for cold‑start scenarios, isolating outliers via random partitioning. Experiments on a 58 marketing dataset show Isolation Forest achieving higher precision@300 compared with LOF, HBOS, and MCD.
Supervised Models Due to severe class imbalance, ensemble tree models (LightGBM, XGBoost, Random Forest) are employed. LightGBM and XGBoost outperform Random Forest in accuracy while maintaining lower training time and memory consumption.
Graph‑Based Community Detection Fast Unfolding (modularity optimization) is used to partition user graphs into communities. Detected anomalous communities exhibit distinct device characteristics, enabling further feature mining.
Semi‑Supervised Graph Neural Network (GRAND) GRAND combines random propagation and consistency regularization to improve robustness and generalization on partially labeled graphs. Compared with GCN, GRAND achieves superior metrics on black‑sample detection.
Application Results Tables and figures (see images) illustrate the performance gains of Isolation Forest, LightGBM/XGBoost, Fast Unfolding, and GRAND on real 58 marketing data.
Conclusion and Outlook The proposed framework continuously discovers black‑market samples, feeds them to supervised and semi‑supervised models, and serves detections both offline and in real time, effectively reducing economic loss. Future work aims to standardize the pipeline for rapid deployment across similar marketing scenarios and to incorporate newer algorithms for deeper fraud mining.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.