Design and Implementation of a Financial Fraud Detection Graph Network Using JanusGraph
This article presents a comprehensive overview of building a financial fraud detection graph network, covering background challenges, graph schema design, a four‑layer architecture with JanusGraph, data import pipelines, quality assurance, performance optimizations, and practical applications such as risk scoring, association analysis, and id‑mapping.
Background – Fraud in financial products often exhibits group behavior, multi‑device usage, and loan‑intermediary scams, which are difficult to detect from a single applicant perspective. Introducing a graph‑based association network expands analysis from an individual to a network view, enabling effective anti‑fraud capabilities across credit, loan, and repayment stages.
Financial Association Network Overview – The network stores real‑world relationships (user‑user, user‑device, user‑address, etc.) as a vertical knowledge graph, comprising roughly 40 entity types and 70 relationship types, with data volume approaching hundreds of billions.
Technical Architecture
Four layers are constructed:
Storage Layer: JanusGraph as the graph engine, HBase for storage, Elasticsearch for external indexing, Kafka for data ingestion, MySQL for business schema definition, and a custom KV store (wtable) for caching pre‑computed graph features.
Compute Layer: OLTP graph queries using Gremlin; OLAP batch computations are mentioned but not detailed.
Service Layer: Encapsulates graph management, online queries, and pre‑computed features into standardized APIs.
Application Layer: Provides risk prediction, gang detection, consistency verification, association reverse lookup, and lost‑contact recovery for financial risk control.
Schema Design
The schema is split into two parts:
Graph schema: Defines vertex labels, vertex properties, edge labels, edge properties, and property keys (single/list/set). JanusGraph constraints such as unique indexes for vertices and edge cardinality (simple, many2one, etc.) ensure data correctness.
Business schema: Captures attribute constraints (range, enumeration, regex) stored in MySQL because the graph database lacks rich attribute validation.
Data Import and Update
Data import follows a Kafka‑driven pipeline. Entity and relation messages are JSON‑encoded. Example entity message:
{"graphName":"network","propertyMap":{"productid":"1","real_name":"张三","create_time":"2019-12-02 15:25:40.0","user_id":"11111111"},"label":"user","messageType":"entity"}Example relation message:
{"graphName":"network","label":"is_friend","messageType":"relation","propertyMap":{"productid":"3","resource":"E000101"},"source":{"label":"user","propertyMap":{"user_id":"11111111"}},"target":{"label":"user","propertyMap":{"user_id":"22222222"}}}Import processes include parsing, schema validation, existence checks, constraint verification, and write operations. Entity imports ensure unique user_id via read‑then‑write logic; edge imports handle cardinality constraints using simple and many2one types.
Data Quality Assurance
Constraint validation using graph and business schemas.
Monitoring via SparkGraphComputer for quick sampling and SparkGremlin for full‑scale metric calculations.
Data repair through offline dumps and targeted deletions/updates.
Performance Optimizations
To handle billions of daily updates, Kafka partitions are aligned with entity IDs to reduce lock contention. For massive batch loads, an offline HFile generation approach writes directly to HBase, achieving a 12.5× speedup for 10‑billion‑edge loads.
Graph Applications
Admission Control: Real‑time risk scoring during credit and loan issuance.
Association Analysis: Post‑approval behavior analysis to refine risk models.
Lost‑Contact Recovery: Cross‑line data integration to locate overdue users.
Query types include association queries (exact and range via combined and hybrid indexes), path queries (bidirectional breadth‑first search with pruning for high‑degree nodes), attribute queries (basic vs. profile attributes), and id‑mapping using offline pre‑computed union‑find structures stored in wtable.
Future Outlook – The authors plan to deepen integration of graph analytics with big‑data computation, enhance consistency checks, and expand risk reporting capabilities.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.