Artificial Intelligence 33 min read

How to Detect and Prevent Advertising Fraud with Advanced AI Techniques

This article explains the scale of online ad fraud, outlines common advertising billing models, describes how fake traffic generates revenue, defines invalid clicks, and presents a comprehensive anti‑fraud system that combines rule‑based methods, feature engineering, and AI models such as TextCNN, BiLSTM, BERT, Wide&Deep and GraphSage to identify and block fraudulent ad clicks.

Alibaba Cloud Developer

Dec 16, 2020

How to Detect and Prevent Advertising Fraud with Advanced AI Techniques

Background

The World Advertising Alliance predicts that by 2025 fake advertising spend will reach $500 billion, becoming the second‑largest illegal revenue stream after drug trafficking. Advertising is the fastest and most direct monetization method for internet companies, but it also fuels a full‑stack fraud industry that threatens platform trust.

Common Advertising Billing Models

Online ads use several pricing schemes:

CPM (Cost Per Mille) : payment per thousand impressions, suitable for brand‑awareness placements such as splash ads.

CPC (Cost Per Click) : payment per click, widely used in search‑engine keyword ads and e‑commerce promotion.

CPA (Cost Per Action) : payment only when a specific action (e.g., registration, add‑to‑cart) occurs.

CPT (Cost Per Time) : fixed‑price contracts based on time (e.g., monthly banner slots).

CPS (Cost Per Sale) : payment per completed sale, common for shopping‑type sites.

CPI (Cost Per Install) : payment per app installation, typical for mobile app promotion.

Fake Traffic Profit Forms and Mechanisms

Ad fraud generates revenue through manipulated clicks, impressions, or installs. The article shows the ad‑placement workflow and how different billing models translate into profit for fraudsters.

Profit mechanisms include machine‑generated clicks (low cost, high volume) and human‑generated clicks (higher cost, harder to detect).

Core Issues of Click Anti‑Fraud

Invalid clicks are defined by Wikipedia as clicks generated by scripts or humans that aim to incur charges without genuine user intent.

Click fraud occurs in pay‑per‑click online advertising when a person, automated script or computer program imitates a legitimate user of a web browser clicking on an ad, for the purpose of generating an improper charge per click.

The advertising ecosystem involves advertisers, agencies, ad exchanges, media sites, and end users, each with distinct incentives that can motivate fraud.

Significance of Anti‑Fraud

Effective anti‑fraud protects advertisers from wasted spend, preserves brand safety, and maintains platform credibility. It balances business growth (engine) with risk mitigation (brake).

Challenges

Business perspective: need to filter fraud without harming legitimate traffic.

Technical perspective: fraud tactics evolve rapidly; models must adapt to new attack patterns while controlling false positives.

Anti‑Fraud Technical System

Data Layer

Collects multi‑day user behavior, exposure, and click logs. These signals feed sequence models such as TextCNN, BiLSTM, BERT and GraphSAGE.

Algorithm & Application Layer

Combines expert rules, statistical strategies, machine‑learning, deep‑learning, and graph models. The system evolves from simple rule‑based detection to sophisticated supervised models.

Architecture Layer

Real‑time detection handles immediate clicks; offline hourly models use richer data to improve accuracy and recall.

Operation Platform

Complaint feedback channel for advertisers.

Active fraud discovery runs offline strategies before users notice issues.

Data sinking stores invalid clicks for model training and downstream cleaning.

Rule vs. Model Comparison

Rules are fast, interpretable, and effective against large‑scale attacks but struggle with evolving fraud. Models (GBDT, Wide&Deep) leverage rule‑generated features and supervised learning to achieve higher precision.

Sample Engineering

Rule‑filtered low‑conversion samples.

SMOTE synthetic oversampling.

GAN‑generated fraud samples.

Feature Engineering

Features span dimensions (time, region, device, IP, etc.) and metrics such as visit depth, duration, behavior path, and click position. Types include count, ratio, distribution, distinct, concentration, and hierarchical distinct counts.

Active Fraud Detection

Anomaly detection across time‑series, statistical, distance, tree, graph, and deep‑learning methods.

Manual market research on fraud tools.

Simulation of fraud attacks and honeypots.

Core Algorithms

Machine‑Learning Models

TextCNN : convolutional network on word embeddings for click sequence classification.

BiLSTM+Attention : bidirectional LSTM with attention to capture contextual patterns.

BERT : pre‑trained transformer encoder fine‑tuned for click fraud classification.

Statistical Models

Relative entropy to compare feature distributions between normal and suspicious clicks.

Wide&Deep combines wide (LR on high‑dimensional features) and deep (neural nets) to balance memorization and generalization.

Graph‑Based Detection

GraphSage aggregates heterogeneous node features (IP, cookie, UTID) to uncover coordinated fraud rings.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Advertising Machine Learning AI anti-fraud Ad Fraud click fraud CPC

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.