Databases 22 min read

Design and Implementation of a Financial Fraud Detection Graph Network Using JanusGraph

This article presents a comprehensive overview of building a financial fraud detection graph network, covering background challenges, graph schema design, a four‑layer architecture with JanusGraph, data import pipelines, quality assurance, performance optimizations, and practical applications such as risk scoring, association analysis, and id‑mapping.

58 Tech

Nov 25, 2020

Design and Implementation of a Financial Fraud Detection Graph Network Using JanusGraph

Background – Fraud in financial products often exhibits group behavior, multi‑device usage, and loan‑intermediary scams, which are difficult to detect from a single applicant perspective. Introducing a graph‑based association network expands analysis from an individual to a network view, enabling effective anti‑fraud capabilities across credit, loan, and repayment stages.

Financial Association Network Overview – The network stores real‑world relationships (user‑user, user‑device, user‑address, etc.) as a vertical knowledge graph, comprising roughly 40 entity types and 70 relationship types, with data volume approaching hundreds of billions.

Technical Architecture

Four layers are constructed:

Storage Layer: JanusGraph as the graph engine, HBase for storage, Elasticsearch for external indexing, Kafka for data ingestion, MySQL for business schema definition, and a custom KV store (wtable) for caching pre‑computed graph features.

Compute Layer: OLTP graph queries using Gremlin; OLAP batch computations are mentioned but not detailed.

Service Layer: Encapsulates graph management, online queries, and pre‑computed features into standardized APIs.

Application Layer: Provides risk prediction, gang detection, consistency verification, association reverse lookup, and lost‑contact recovery for financial risk control.

Schema Design

The schema is split into two parts:

Graph schema: Defines vertex labels, vertex properties, edge labels, edge properties, and property keys (single/list/set). JanusGraph constraints such as unique indexes for vertices and edge cardinality (simple, many2one, etc.) ensure data correctness.

Business schema: Captures attribute constraints (range, enumeration, regex) stored in MySQL because the graph database lacks rich attribute validation.

Data Import and Update

Data import follows a Kafka‑driven pipeline. Entity and relation messages are JSON‑encoded. Example entity message:

{"graphName":"network","propertyMap":{"productid":"1","real_name":"张三","create_time":"2019-12-02 15:25:40.0","user_id":"11111111"},"label":"user","messageType":"entity"}

Example relation message:

{"graphName":"network","label":"is_friend","messageType":"relation","propertyMap":{"productid":"3","resource":"E000101"},"source":{"label":"user","propertyMap":{"user_id":"11111111"}},"target":{"label":"user","propertyMap":{"user_id":"22222222"}}}

Import processes include parsing, schema validation, existence checks, constraint verification, and write operations. Entity imports ensure unique user_id via read‑then‑write logic; edge imports handle cardinality constraints using simple and many2one types.

Data Quality Assurance

Constraint validation using graph and business schemas.

Monitoring via SparkGraphComputer for quick sampling and SparkGremlin for full‑scale metric calculations.

Data repair through offline dumps and targeted deletions/updates.

Performance Optimizations

To handle billions of daily updates, Kafka partitions are aligned with entity IDs to reduce lock contention. For massive batch loads, an offline HFile generation approach writes directly to HBase, achieving a 12.5× speedup for 10‑billion‑edge loads.

Graph Applications

Admission Control: Real‑time risk scoring during credit and loan issuance.

Association Analysis: Post‑approval behavior analysis to refine risk models.

Lost‑Contact Recovery: Cross‑line data integration to locate overdue users.

Query types include association queries (exact and range via combined and hybrid indexes), path queries (bidirectional breadth‑first search with pruning for high‑degree nodes), attribute queries (basic vs. profile attributes), and id‑mapping using offline pre‑computed union‑find structures stored in wtable.

Future Outlook – The authors plan to deepen integration of graph analytics with big‑data computation, enhance consistency checks, and expand risk reporting capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data pipeline graph database financial fraud Risk analysis JanusGraph

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.