Artificial Intelligence 18 min read

How Graph Algorithms Power Anti‑Fraud in Marketing and E‑Commerce

This article explores how black‑market cheating in marketing campaigns and e‑commerce is detected using graph‑based techniques such as same‑person mining, label propagation, Fraudar, and GCN models, and discusses future directions like multimodal data fusion and real‑time dynamic graph computation.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
How Graph Algorithms Power Anti‑Fraud in Marketing and E‑Commerce

Business Background

In marketing campaigns, black‑market groups automate large‑scale reward harvesting, waste marketing funds, and pollute data metrics, leading to inaccurate operational decisions.

Typical cheating stages include resource preparation (mass‑purchasing accounts), task execution (automated scripts simulating user actions), and fund cash‑out (using dispersed real WeChat accounts).

Same‑Person Mining

We define a "same‑person" group as accounts and devices sharing the same underlying entity. By constructing a graph where user accounts are nodes and device IDs, phone numbers, withdrawal IDs, and encrypted ID cards are edges, we extract maximal connected subgraphs to identify same‑person relationships and apply stability processing across multiple days, achieving 96.8% overall ID stability and 99.3% after filtering unstable merges.

Same‑person mining stability results
Same‑person mining stability results

Label Propagation Algorithm and Application

The Label Propagation Algorithm (LPA) is a semi‑supervised graph‑based method for community detection and node classification. It iteratively spreads the most frequent neighbor label to each unlabeled node until convergence.

Advantages: computationally efficient (O(|E|) per iteration), no parameter tuning, naturally parallelizable.

Limitations: possible instability due to random initialization, ignores node features, performs poorly on sparse graphs.

Label propagation process
Label propagation process

Fraudar Algorithm and Application

Fraudar targets bipartite graphs (e.g., user‑shop). It defines a global suspiciousness score g(S)=f(S)/|S|, where f(S) sums node and edge suspiciousness. The algorithm iteratively removes the node with the lowest contribution, tracking g(S) to select the subgraph with maximal suspiciousness.

Key properties: higher node/edge suspiciousness increases g(S); larger dense subgraphs are more suspicious; smaller subgraphs with the same total suspiciousness are more suspicious.

Optimized with a priority heap, the total complexity becomes O(|E| log|V|). Multiple suspicious subgraphs are obtained by repeatedly removing the identified subgraph and re‑running Fraudar.

Fraudar global suspiciousness
Fraudar global suspiciousness

GCN Model and Application

GCN (Graph Convolutional Network) aggregates neighbor features to update node representations, enabling end‑to‑end learning on graph data. The core propagation formula is H^{(l+1)} = σ( H^{(l)} W^{(l)}), where  is the normalized adjacency with self‑loops.

Advantages: integrates graph structure with node features, efficient sparse matrix computation, suitable for node classification, graph classification, and link prediction.

Limitations: over‑smoothing in deep layers, assumes static graphs.

GCN aggregation
GCN aggregation

Summary and Outlook

We have deployed LPA, Fraudar, and GCN for risk control and anti‑fraud, achieving solid results. Future work includes multimodal data fusion (graph, time‑series, text), real‑time dynamic graph computation to counter rapid black‑market evolution, and enhancing interpretability and adversarial robustness.

Graph algorithms are evolving from auxiliary tools to the core engine of risk control systems, with ongoing challenges in balancing technical performance, compliance, and user experience.

anti-fraudGCNrisk detectionlabel propagationgraph algorithmsFraudar
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.