Artificial Intelligence 23 min read

Applying Graph Neural Networks to Fraud Detection: Background, Research Progress, Methods, and Resources

This article reviews the fundamentals of fraud, surveys the evolution of graph neural network research for fraud detection, outlines practical application steps, discusses key challenges such as disguise, scalability, and label scarcity, and provides representative papers, new research directions, industrial case studies, and open-source resources.

DataFunSummit
DataFunSummit
DataFunSummit
Applying Graph Neural Networks to Fraud Detection: Background, Research Progress, Methods, and Resources

1. What Is Fraud

According to U.S. law, fraud involves four aspects: false statements of fact, transactions between parties, the fraudster’s knowledge of falsity, and inducement of harmful actions.

The article compares fraudsters with hackers and anomalies, emphasizing that fraudsters often share information (e.g., IPs, devices) and exhibit clustered behavior in graph structures.

2. Types of Fraud

Fraud spans social network bots, fake news, financial risk control (loan default, insurance credit, money laundering, fake transactions, credit‑card cash‑out), advertising traffic fraud, CTR manipulation, fake app retention, e‑commerce “sheep parties”, crowdsourcing attacks, and game cheating.

3. GNN‑Based Fraud Detection

The typical workflow consists of three steps:

Construct a graph from logs, turning users into nodes and their interactions into edges.

Train a graph neural network to learn node, edge, or graph embeddings.

Use the embeddings and a small set of labeled nodes (e.g., red nodes = fraud) to train a classifier for unlabeled nodes.

The core assumption is homophily: fraudulent users tend to connect with other fraudulent users.

4. Research Timeline

Key milestones include:

2018 GraphRAD (MLG@KDD’18) – first GNN for fraud detection.

2018 GEM (CIKM’18) – heterogeneous graph for Alipay fraud.

2019 GeniePath (AAAI’19), InsurGNN (SIGIR’19) – domain‑specific fraud detection.

2019 BitGCN (ADF@KDD’19) – GCN on Bitcoin transaction graphs.

2019 GAS (CIKM’19) – fake reviews on Xianyu platform.

2019 Player2Vec (CIKM’19) – underground forum fraud.

2020 CARE‑GNN (CIKM’20) – reinforcement‑learning neighbor selection for disguised fraud.

2020 GAL (CIKM’20) – unsupervised GNN with limited labels.

2021 MvMoE (WSDM’21) – multi‑task credit and default prediction.

2021 APAN (SIGMOD’21) – streaming learning for dynamic data.

2021 DCI (SIGIR’21) – self‑supervised contrastive learning for fraud.

2021 IHGAT (KDD’21) – user‑motivation modeling for explainable fraud detection.

2021 FD‑NAG (BigData’21) – fraud detection for Grab rides with edge‑feature conversion and self‑supervised learning.

5. Application Methodology

Five guiding questions help decide how to apply GNNs:

Do we need a graph? – shared resources and clustered behavior suggest yes.

Should we use GNNs instead of traditional graph models? – end‑to‑end learning and existing deep‑learning infrastructure favor GNNs.

What task? – node classification, edge classification, graph classification, clustering, or group detection.

How to design the graph structure? – choose node/edge types, sampling strategies, and heterogeneous or hierarchical designs.

Which GNN model? – select a mature model matching the task (e.g., GCN, GAT, heterogeneous GNN).

Key Challenges and Solutions

Disguise problem: neighbor filtering, adversarial training, generative adversarial samples, Bayesian edge weighting.

Scalability: limited industrial solutions; non‑deep graph models often scale better.

Class imbalance: down‑sampling, balanced neighbor selection, data augmentation.

Label scarcity: active learning, semi‑supervised or self‑supervised learning, meta‑learning.

Label quality: label correction via active learning, human‑in‑the‑loop reinforcement learning.

Data scarcity: data augmentation to synthesize fraudulent examples.

Emerging Practices

Graph pre‑training (contrastive learning) is gaining traction, especially when fraudsters differ structurally from normal users. Dynamic graphs capture temporal patterns but increase computational cost; streaming learning frameworks (e.g., APAN) address efficiency. Multi‑task learning and explainability (e.g., IHGAT) are also active research areas.

6. Industrial Deployments

Major industry contributions come from Alibaba, Ant Financial, Facebook, Amazon, and others. Notable systems include the SafeGraph open‑source suite (DGFraud, UGFraud, UPFD) and various anomaly‑detection libraries (PyOD, PyODD, DGL‑based pipelines).

These resources provide code, datasets, and benchmarks for both academic research and production‑grade fraud detection.

7. Resources

Open‑source projects: SafeGraph (graph‑based fraud detection toolkit), UGFraud (unsupervised graph fraud detection), UPFD (fake‑news detection with GNNs). Additional curated lists of fraud‑detection papers, datasets, and code are maintained by the community.

For more details, refer to the original slides and the DataFunTalk platform.

machine learningfraud detectionAIsecurityGNNgraph neural networks
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.