LPA-Detector: Distributed Label Propagation with Confidence Weights for Large‑Scale Graph Risk Detection
The article introduces LPA-Detector, an open‑source project that redesigns the Label Propagation Algorithm using Spark GraphX to add node confidence weights and relationship influence, achieving significant improvements in execution efficiency and detection accuracy for massive graph data in risk‑control scenarios.
This document presents LPA-Detector, an open‑source project released in March 2020 that refactors the classic Label Propagation Algorithm (LPA) on the Spark GraphX framework, adding a confidence‑weight evaluation method to improve both runtime efficiency and algorithmic effectiveness for large‑scale graph risk detection.
Background : In risk‑control applications, relational networks model entities and their connections as graphs. Traditional LPA suffers from two major issues: (1) random updates cause unstable results, and (2) massive graph sizes (billions of edges) make single‑machine execution impractical.
Algorithm Improvements :
Distributed LPA implementation using Spark/Hadoop and GraphX for parallel processing.
Introduction of a label‑confidence weight that combines node confidence and edge influence, serving as a selection criterion to reduce randomness and improve stability.
Optimized message passing and merging to filter unnecessary messages, cutting iteration time to roughly one‑quarter on graphs with over a billion edges.
Weight Calculation : The confidence weight evaluates the importance of a label based on the node’s risk relevance and the strength of its relationships, allowing high‑impact nodes and edges to dominate the propagation process.
Implementation Details :
Input graph G = {V, E} where V are vertices and E are edges. The algorithm outputs risk clusters Setc = {C1, C2, …, CK}.
Data preparation examples:
{"attr":{"type":"v0"},"id":0} {"attr":{"type":"v2"},"id":2} {"dstId":2,"prop":{"type":"E0-2"},"srcId":0}Weight configuration is set via a configuration file, and the number of iterations is adjustable. Execution is performed by running run.sh .
Usage : Prepare node and edge data, configure confidence weights, set iteration count, and execute the script.
Future Plans : Extend the framework to support custom confidence weights for various business scenarios, and release additional graph tools such as community detection, graph embedding, and feature extraction.
Contribution & Feedback : Users are encouraged to submit pull requests or issues on the GitHub repository https://github.com/wuba/LPA-Detector .
Author : Huang Jia, Senior Development Engineer at 58 Financial, focusing on anti‑fraud detection.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.