Big Data 8 min read

LPA-Detector: Distributed Label Propagation with Confidence Weights for Large‑Scale Graph Risk Detection

The article introduces LPA-Detector, an open‑source project that redesigns the Label Propagation Algorithm using Spark GraphX to add node confidence weights and relationship influence, achieving significant improvements in execution efficiency and detection accuracy for massive graph data in risk‑control scenarios.

58 Tech

Mar 26, 2020

LPA-Detector: Distributed Label Propagation with Confidence Weights for Large‑Scale Graph Risk Detection

This document presents LPA-Detector, an open‑source project released in March 2020 that refactors the classic Label Propagation Algorithm (LPA) on the Spark GraphX framework, adding a confidence‑weight evaluation method to improve both runtime efficiency and algorithmic effectiveness for large‑scale graph risk detection.

Background : In risk‑control applications, relational networks model entities and their connections as graphs. Traditional LPA suffers from two major issues: (1) random updates cause unstable results, and (2) massive graph sizes (billions of edges) make single‑machine execution impractical.

Algorithm Improvements :

Distributed LPA implementation using Spark/Hadoop and GraphX for parallel processing.

Introduction of a label‑confidence weight that combines node confidence and edge influence, serving as a selection criterion to reduce randomness and improve stability.

Optimized message passing and merging to filter unnecessary messages, cutting iteration time to roughly one‑quarter on graphs with over a billion edges.

Weight Calculation : The confidence weight evaluates the importance of a label based on the node’s risk relevance and the strength of its relationships, allowing high‑impact nodes and edges to dominate the propagation process.

Implementation Details :

Input graph G = {V, E} where V are vertices and E are edges. The algorithm outputs risk clusters Setc = {C1, C2, …, CK}.

Data preparation examples:

{"attr":{"type":"v0"},"id":0}

{"attr":{"type":"v2"},"id":2}

{"dstId":2,"prop":{"type":"E0-2"},"srcId":0}

Weight configuration is set via a configuration file, and the number of iterations is adjustable. Execution is performed by running run.sh.

Usage : Prepare node and edge data, configure confidence weights, set iteration count, and execute the script.

Future Plans : Extend the framework to support custom confidence weights for various business scenarios, and release additional graph tools such as community detection, graph embedding, and feature extraction.

Contribution & Feedback : Users are encouraged to submit pull requests or issues on the GitHub repository https://github.com/wuba/LPA-Detector .

Author : Huang Jia, Senior Development Engineer at 58 Financial, focusing on anti‑fraud detection.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Distributed Computing Spark Risk Detection label propagation graph algorithms

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.