Applying Graph Neural Networks for Early Fraud Warning and Malicious URL Detection
This article explains how Tencent's security data lab uses graph neural networks to build heterogeneous temporal graphs for early warning of water‑room fraud cards and to detect malicious URLs, detailing the data modeling, graph construction, attention‑based aggregation, model training, and evaluation results.
The talk introduces two practical applications of Graph Neural Networks (GNN) developed by Tencent Security Big Data Lab: early warning of water‑room fraud cards and detection of malicious URLs.
1. GNN for Water‑Room Card Early Warning
Scenario : Mobile‑internet‑driven telecom fraud creates a supply chain of bank cards (water‑room cards) that are quickly used for money laundering, leaving a short window for detection.
Method : A temporal heterogeneous graph is built where each card at different timestamps becomes a separate node, and edges link cards, devices, IPs, etc. Information diffusion spreads suspicious signals across the graph, and virtual graph construction simulates the full fraud workflow for training.
Model Design : Node features of different types (IP, device, behavior) are aggregated with max‑pooling, transformed to a common dimension, and combined using attention mechanisms. The final node embedding concatenates neighbor and self features and is fed to a DNN for classification.
Evaluation : The early‑warning model improves detection across various fraud scenarios (telecom fraud, loan fraud, etc.) compared with baseline methods.
2. GNN for Malicious URL Detection
Scenario : Malicious URLs often hide behind short links or image‑only pages, making traditional text‑based detection difficult.
Node Representation : Multi‑modal features (URL characters, text embeddings, statistical metrics) are fused; character‑level CNN (TEXTCNN) processes URL strings, while DNN handles statistical features. The three embeddings are combined to form the URL node representation.
Graph Construction : Additional heterogeneous edges capture ownership (site‑domain‑IP), redirection (short‑link jumps), citation (cross‑site traffic), and clustering (shared hosting) relationships, enriching the graph structure.
Model Design : Using a HinSAGE‑style architecture, neighbor nodes are aggregated with attention, concatenated with the target node, and passed through a DNN for prediction.
Results : With a precision target of 70%, the model achieves a recall of 92.5%, a 28.9% improvement over multi‑modal baselines.
Conclusion
Graph‑based AI models provide powerful tools for monitoring black‑market activities, though challenges remain in interpretability and operational deployment; the insights guide security teams in designing effective counter‑measures.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.