Real‑Time Graph Neural Network for Payment Fraud Detection at eBay
This article describes how eBay applies graph neural networks to real‑time payment fraud detection, covering the anti‑fraud scenario, limitations of traditional GBDT pipelines, challenges of constructing and serving dynamic heterogeneous graphs, the end‑to‑end solution with directed slice graphs and a Lambda‑style architecture, and experimental results comparing GNN with LightGBM.
The talk begins with an overview of eBay's payment fraud landscape, highlighting risk assessment points before, during, and after a transaction and explaining why real‑time detection is critical.
It then outlines the traditional end‑to‑end pipeline: feature engineering for account‑level variables, labeling based on unauthorized transactions, handling severe class imbalance, and training a GBDT model (e.g., LightGBM) that is later deployed for online scoring.
Next, the limitations of tabular models are discussed, emphasizing that relational features (shared addresses, IPs, emails) are naturally expressed as graph edges, which traditional pipelines struggle to capture efficiently.
The core of the presentation focuses on the challenges of deploying GNNs in a real‑time setting: temporal leakage when constructing a bipartite event‑entity graph, high latency of neighbor queries, and the computational cost of deep models.
To address these, a directed dynamic slice graph is introduced, where each time slice forms a sub‑graph and edges are categorized as (1) order‑to‑entity, (2) historical entity‑to‑entity within a time window, and (3) current‑order propagation edges, with “shadow” orders used to prevent future‑information leakage.
A Lambda‑style architecture is then described: offline embedding of entities via GNNs stored in a key‑value store, and online inference that retrieves a small set of relevant embeddings, combines them with GBDT‑encoded features, and passes them through a final GNN layer for risk scoring.
Experimental results compare the proposed GNN pipeline against LightGBM and MLP baselines on a large e‑commerce fraud dataset. GCN‑based models achieve roughly a 25% improvement in accuracy over LightGBM, while GAT does not outperform GCN due to limited hyper‑parameter tuning.
The talk concludes with a summary of the end‑to‑end solution—graph partitioning, dynamic slicing, and decoupled inference—and outlines future directions such as exploring temporal GNNs (e.g., TGN) and more sophisticated graph partitioning strategies.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.