Artificial Intelligence 12 min read

Graph-Based Unsupervised Model for Detecting Malicious Account Clusters in Registration Risk Control

This article presents a graph‑neural‑network driven, unsupervised approach that builds heterogeneous user‑feature graphs, learns node weights, constructs user‑user similarity graphs, and applies threshold‑based clustering to identify abnormal registration clusters for fraud detection in Ctrip's business travel platform.

Ctrip Technology

May 25, 2023

Graph-Based Unsupervised Model for Detecting Malicious Account Clusters in Registration Risk Control

Background – Registration fraud is a critical risk in enterprise scenarios, where malicious actors create large numbers of fake accounts to conduct fraudulent activities. Detecting whether a newly registered user belongs to an abnormal cluster of black‑market accounts is essential.

Preliminary Knowledge – The article reviews basic graph concepts, distinguishing directed/undirected graphs, homogeneous versus heterogeneous graphs, and illustrates these with simple diagrams.

Data Mining – Analysis of user devices (IP, phone, device IDs) shows that legitimate users typically use few devices, while black‑market users exhibit high device aggregation. Geographic and OS version features are also examined, revealing that outdated OS versions correlate with fake accounts. Registration time patterns indicate that malicious users prefer evening and night hours.

Model Design – The model consists of four steps: feature extraction, unsupervised weight learning, construction of a user registration homogenous graph, and malicious account detection. Feature extraction discretizes continuous attributes. An heterogeneous user‑feature graph is built where edges link users to their attributes. Initial weights for feature nodes are computed using a formula that considers the frequency of each feature within its category (A‑type common features vs. B‑type shared suspicious features). User node weights are the average of neighboring feature weights.

Weight updates are performed via linear belief propagation, iteratively refining node scores while keeping them within [0,1]. The update rule adds the average influence of neighboring nodes to the previous weight.

To obtain a user‑user homogenous graph, pairwise similarity between users is calculated as the sum of shared feature node weights. Edges are added when similarity exceeds a dynamically determined threshold, which is tuned by observing stability of the largest connected component as the threshold increases.

Malicious clusters are identified by extracting connected subgraphs whose node count exceeds a predefined aggregation threshold (typically 10‑30 users). These subgraphs are flagged as abnormal.

Real‑Time Strategy – For each new registration, the model is refreshed using data from the past 24‑72 hours and periodically (every n minutes) to ensure up‑to‑date parameters. The new user’s features are compared against the current set of known bad users; if similarity surpasses the threshold, the account is marked as an abnormal cluster.

Practical Results – Historical daily models show high accuracy and recall. Online metrics indicate registration interception rates ranging from 78% to 100% across months, and risk identification rates for black‑market users between 63% and 99.8% across different travel scenarios.

Limitations and Future Work – Current weight calculations ignore historical model information; incorporating past abnormal user features could improve initial weights. Additionally, neighbor influence is treated uniformly; weighting neighbors based on connection strength or historical data may yield better performance.

References – The article cites works on graph neural networks, unsupervised fraud detection, and structure‑based Sybil detection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

fraud detection Anomaly Detection unsupervised learning Graph Neural Network heterogeneous graph registration risk

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.