Exploring Graph Computing for Credit Fraud Detection: Background, Applications, Risk Graph System, and Performance Optimization
This article presents a comprehensive overview of how graph computing, powered by AI and big‑data techniques, is applied to credit fraud detection, covering background, pre‑mid‑post fraud scenarios, the risk graph architecture, performance tuning methods, and a Q&A with Ant Group experts.
Guest: Yu Xiaolu (Ant Group) Editor: Wu Jianhua (University of Electronic Science and Technology of China) Platform: DataFunTalk
Overview: Recent rapid development of big data and AI has driven the financial credit industry toward an intelligent, digital era. By leveraging AI as a technical backbone, a "smart credit brain" enables end‑to‑end credit workflow control and improves credit rating models, reducing risk and enhancing prevention capabilities. The talk focuses on four topics: credit fraud background, graph applications in pre‑mid‑post fraud stages, the risk big‑data graph system, and graph computing performance optimization.
01. Credit Fraud Background Introduction
1. Background and Graph Use: Credit fraud detection involves perception, interception, and monitoring of both B‑side and C‑side activities. Graph computing is applied in three stages:
Pre‑fraud: Risk perception using graph reasoning for fine‑grained merchant admission and rating, and temporal graph analysis for early anomaly detection.
Mid‑fraud: Intercepting applications/withdrawals by elevating data via graph‑based feature expansion and using quantitative strategies or models; identity verification through graph cross‑validation.
Post‑fraud: Comprehensive monitoring and handling using graph pattern recognition and community detection to identify cash‑out behavior and fraud rings.
02. Graph Applications in Pre‑Mid‑Post Fraud
1. End‑to‑end graph workflow: Offline scheduling or near‑real‑time event‑driven risk detection (pre‑fraud), online request‑driven interception (mid‑fraud), and near‑line sub‑graph analysis for real‑time risk identification (post‑fraud). The system also supports full‑graph offline analysis and interactive graph exploration.
2. Development Timeline:
2018: Introduced graph‑based near‑line monitoring for cash‑out detection, achieving high accuracy but low coverage.
2019: Added gang‑mining algorithms to broaden coverage as fraud became more organized.
2020: Integrated graph sub‑graphs as inputs to Graph Neural Networks for richer features, improving generalization and accuracy.
2021: Built a risk big‑data graph ecosystem to scale graph technology efficiently.
3. Consolidated Graph Techniques: Traversal & aggregation of multi‑hop features, pattern detection & matching for fund loops, community detection (LPA, Louvain, K‑Core), graph learning (GCN) and graph reasoning.
03. Risk Big‑Data Graph System
1. Risk Graph Architecture: A unified platform (RiskGraph) that defines automatic graph modeling, analysis, simulation, and deployment with consistent semantics across offline, simulation, and online environments. This reduces project turnaround from three months to one week and enables large‑scale graph processing.
2. Challenges Addressed:
Inconsistent data definitions between real‑time and offline sources.
Semantic gaps between offline batch graph computation and real‑time graph engines.
Need for long‑term simulation and back‑testing to ensure stable strategy performance.
04. Graph Computing Performance Optimization
1. Optimization Strategies:
Increase Concurrency: Scale cluster resources, employ asynchronous processing and multithreading to mitigate hotspot skew.
Reduce Overhead: Choose appropriate graph partitioning (vertex‑cut vs edge‑cut), eliminate locks, minimize context switches, add indexes, cache data in memory, and batch I/O operations.
2. Case 1 – Simulation Performance: Simulating 90 days of data required a week due to 90× higher throughput. By introducing asynchronous source/sink pipelines and adding out‑edge indexes, the runtime dropped to one day.
3. Case 2 – Real‑time Fund Loop Detection: Detecting 3‑hop fund loops (A→B→C→A) was costly. Simplifying common 2‑hop loops via stream joins and applying limits with indexed out‑edges reduced computation from minutes to seconds, maintaining performance during peak traffic.
05. Q&A
Q: Does Ant Group use Ray for graph computing? A: Ray is a Berkeley‑origin distributed framework used extensively at Ant. The graph engine itself is a separate framework, but the underlying compute engine leverages Ray.
Q: Recommended open‑source graph stores? A: TigerGraph and Neo4j are commonly used; many other solutions exist with various benchmarks.
Q: Can graph computing be done with SQL? A: Due to complex graph queries, languages like Gremlin are preferred over SQL.
Q: Is Ant's graph computing framework open‑source? A: Not yet, but there are plans to open source it in the future.
Thank you for attending! Please like, share, and give a triple‑click at the end of the article.
Free resource download: "Big Data Collection" and "Core Internet Application Algorithms" – scan the QR code and reply to the WeChat assistant to receive the materials.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.