Artificial Intelligence 14 min read

Applying Graph Algorithms and Graph Convolutional Networks to Advertising Anti‑Fraud

This article describes how graph theory and graph convolutional neural networks are leveraged to model user‑IP relationships, detect fraudulent advertising clusters, and improve detection accuracy and recall through a combination of unsupervised graph algorithms and supervised GCN training in a large‑scale ad‑anti‑fraud system.

58 Tech
58 Tech
58 Tech
Applying Graph Algorithms and Graph Convolutional Networks to Advertising Anti‑Fraud

Graphs model relationships between entities; originating from the Königsberg problem, they are used to represent both physical networks (road, power) and virtual networks (social media, ads). In advertising anti‑fraud, accounts, devices, IPs, etc., become nodes, and their interactions become edges.

Classic graph algorithms such as connected subgraph detection, label propagation, and the Louvain community detection capture structural information, while Graph Convolutional Networks (GCN) additionally learn from node attributes, offering higher detection precision.

The 58.com anti‑fraud team applied connected subgraph, label propagation, Louvain, and GCN algorithms to identify fraudulent advertising behavior. The workflow first builds a graph from external behavior data, runs unsupervised algorithms to generate clusters, filters and manually validates them to produce high‑quality black‑white labels, and then uses these labels together with node features to train a supervised GCN model.

Graph construction uses a bipartite user‑IP graph, filtered to remove public IPs, and is transformed into a user‑user graph where two users sharing an IP are connected, ensuring a homogeneous node type for GCN input.

Because the graph contains billions of edges, the unsupervised algorithms are executed on Spark GraphX, a distributed graph‑processing framework, and results are visualized with Neo4j.

The GCN model consists of two graph convolution layers with dropout, ReLU activation, Adam optimizer, and cross‑entropy loss. The dataset contains 403,000 vertices, 3.12 million edges, 53‑dimensional node features, and a 2:1 positive‑negative class ratio (300 k training vertices, 103 k validation vertices). Training achieves 98.9% accuracy and 96% recall; validation reaches 98.4% accuracy and 93% recall. t‑SNE visualizations show clear separation of classes.

Compared with a baseline XGBoost model (95.3% accuracy), the GCN improves accuracy to 98.2% and adds 20.3% recall, while the other unsupervised algorithms also increase recall by more than 10%.

In summary, graph algorithms—especially GCN—significantly boost advertising anti‑fraud detection. Future work includes incorporating more behavioral dimensions, adding edge weights, and further refining the connected subgraph, label propagation, and Louvain methods.

advertisingMachine Learninganti-fraudGraph Neural NetworksGCNgraph algorithmsSpark GraphX
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.