Graph Modeling and GCN Exploration at 极验: Evolution, Offline and Real‑time Solutions
The talk presents an overview of graph neural network development, explains 极验's graph modeling research and evolution, and details offline and real‑time GCN solutions, including self‑supervised training, large‑scale handling, and performance comparisons, highlighting practical applications in fraud detection and risk control.
In recent years, graph modeling has attracted high research interest both in academia and industry, with many teams exploring its integration into business scenarios such as banking, social platforms, insurance, and public security.
Graph Model Introduction – A brief history of Graph Neural Networks (GNN) is given, covering early iterative methods (2005‑2013), the introduction of graph convolution in 2013, the shift to spatial‑based convolutions in 2016, and the wide adoption of spatial GNNs across computer vision, NLP, and social network analysis.
Research Wave – The volume of GNN papers has grown dramatically, and many industry teams are seeking to combine graph algorithms with their own data, indicating a broad industry push for graph‑enabled solutions.
Graph Modeling Characteristics – Traditional risk control relies on static rule‑based systems; graph models enable end‑to‑end learning from relational data, allowing automatic extraction of structural and feature information for tasks such as device sharing detection.
极验’s Graph Modeling Exploration – Since the emergence of GCNs, 极验 has applied graph models to fraud detection in banks, social platforms, insurance, and public security, publishing a book on GNNs and building a proprietary graph modeling platform.
Offline GCN Solutions – Two main approaches are described: (1) a basic supervised GCN pipeline (feature matrix + adjacency matrix → embedding → softmax classifier) and (2) a self‑supervised scheme that generates pseudo‑labels to reduce labeling cost. Large‑scale handling is achieved either by using higher‑memory GPUs or by a CPU‑only training that learns only graph embeddings followed by traditional classifiers such as Random Forest.
Experimental results show that the CPU + Random Forest pipeline achieves comparable performance to the full GPU GCN while handling up to 2 million nodes and 40 million edges within ~10 minutes.
Real‑time GCN Solutions – To reduce latency, data are chunked into 1 000‑record blocks. An initial graph construction phase (pre‑processing) is performed offline, then each incoming record is incrementally added to the graph, embedding is computed, and a clustering‑based anomaly score is derived by comparing against a white‑sample distribution. Optimizations reduce per‑record processing time to 20‑40 ms, achieving near‑real‑time detection.
Model Effectiveness – Compared with existing rule‑based engines, the offline GCN improves anomaly detection by ~17 % and the real‑time GCN by ~13 % (up to 20‑30 % on some clients). A book titled “深入浅出图神经网络” and a custom graph modeling platform (including tools for offline GCN, real‑time inference, distributed storage, and dynamic graph construction) are also introduced.
Our Reflections – The team discusses technical maturity (model performance, interpretability, adaptive node selection) and service maturity (model portability, customized solutions, closed‑loop integration with strategy engines). They identify two major bottlenecks: lack of algorithms and scarcity of reliable labels, and propose leveraging advances from NLP/vision (e.g., attention mechanisms) to address them.
Q&A covers graph construction time reduction, the use of random parameters in real‑time GCN, and evaluation of false‑positive rates.
Overall, the presentation provides a comprehensive, technically detailed overview of graph neural network research, practical deployment strategies, and performance insights relevant to AI practitioners and researchers.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.