Graph Mining Algorithms for Advertising Traffic Fraud Detection and Platform Engineering
This article presents graph‑based fraud detection techniques for advertising traffic, detailing dense subgraph algorithms such as Fraudar and D‑Cube, their engineering optimizations, real‑world case studies, and the design of a scalable graph‑mining platform for large‑scale security applications.
The advertising traffic fraud scenario at JD.com includes malicious clicks, impressions, and orders driven by various motives such as competitor attacks, media channel brushing, and data scraping. Fraudsters employ high‑frequency or low‑frequency cheating modes, the latter requiring modeling of associations among many resource entities.
Graph mining, which captures relationships between entities via nodes and edges, is introduced as an effective tool to detect such coordinated fraud. Two unsupervised dense‑subgraph algorithms are highlighted: Fraudar, a bipartite dense‑subgraph detector originally designed for fake followers and reviews, and D‑Cube, a dense‑subtensor detector for k‑uniform hypergraphs that can incorporate additional dimensions such as time.
Fraudar defines edge suspiciousness based on the target node’s in‑degree decay, aggregates node suspiciousness, and uses a greedy removal process to find the subgraph with maximal average suspiciousness. D‑Cube extends this to higher‑order tensors, enabling detection of fraud groups that exhibit temporal clustering.
Practical case studies demonstrate the application of these algorithms: (1) detecting malicious external ad traffic by constructing a user‑ad click bipartite graph and extracting dense subgraphs with Fraudar, identifying tens of thousands of fraudulent users daily; (2) detecting malicious internal search‑ad clicks by building a four‑mode tensor (user, ad, keyword, time) and applying D‑Cube to uncover coordinated black‑market attacks.
A graph‑mining algorithm platform has been engineered to integrate these methods, providing a six‑layer architecture (data, format conversion, core algorithms, scheduling/output, applications) capable of handling billions of nodes and edges. Engineering optimizations for Fraudar include a shared‑leaf‑node priority tree to reduce memory and accelerate minimum‑suspiciousness node retrieval, and neighbor compression storing adjacency lists in 8‑, 16‑, 24‑, or 32‑bit formats, achieving 5‑8× speedup and ten‑fold memory reduction on a 12.5‑billion‑node graph.
The work concludes that graph models are powerful for structuring relational data, and future directions involve designing distributed, scalable unsupervised graph mining algorithms to meet the challenges of big‑data fraud detection.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.