DSP Advertising System Architecture and Key Technologies
This article presents a comprehensive overview of DSP advertising system architecture, covering real‑time bidding infrastructure, audience targeting, data processing pipelines using Hadoop, Spark and Storm, click‑through‑rate prediction models, and anti‑fraud mechanisms, offering practical insights for engineers building high‑performance ad platforms.
Introduction
The speaker, a technical director experienced in mobile ad platforms and large‑scale data processing, shares practical experiences from building a Demand‑Side Platform (DSP) over several years, aiming to provide a reference for engineers interested in ad‑system development.
Requirements of a DSP
A robust DSP must support high‑throughput Real‑Time Bidding (RTB) capabilities and advanced audience‑targeting techniques, handling millions of page views and billions of users within strict latency constraints (typically under 100 ms).
System Architecture
The overall architecture includes advertisers, media sites, ad exchanges, and the DSP core modules such as the RTB engine, business platform, logging system, DMP, campaign management, and anti‑fraud components. The workflow spans from pre‑campaign setup, user behavior collection, request routing, bidding decision, price encryption, to impression and click billing.
RTB Engine
The RTB engine, implemented in C++ on Linux, exposes an HTTP interface and processes protobuf‑encoded requests. It uses a modular adapter layer to normalize traffic from various SSPs/ADXs, a pluggable algorithm pool with hot‑swap dynamic libraries (dlopen/dlsym/dlclose) and inotify‑driven configuration reloads, enabling A/B testing and real‑time algorithm updates.
DMP Data Processing
Data processing combines Hadoop offline jobs, Spark batch jobs, and Storm stream processing. Results are stored in HBase/MySQL for query, Redis for real‑time lookup, and Elasticsearch for near‑real‑time indexing. Machine‑learning libraries (MLlib) are used for classification, clustering, collaborative filtering, decision trees, and logistic regression.
User Profiling and Targeting
User profiles are built as {user_id: {tag: weight, …}} using three tag types: user‑behavior tags (t(u)), context tags (t(c)), and advertiser‑custom tags (t(a,u)). These tags support demographic, behavioral, contextual, and look‑alike targeting, with time‑decay mechanisms to keep profiles fresh.
CTR Prediction
Click‑through‑rate (CTR) prediction is critical for ranking ads. Simple historical statistics are supplemented with logistic‑regression models and can be extended with machine‑learning feature engineering, collaborative‑filtering, and various ranking approaches (point‑wise, pair‑wise, list‑wise).
Anti‑Fraud Measures
Anti‑fraud strategies include detecting P2P traffic‑swapping tools, CPS traffic‑steering fraud, and domain‑level analysis using WHOIS information. Honey‑pot servers capture malicious behavior, while logs are aggregated via Logstash, indexed in Elasticsearch, and visualized in Kibana to identify suspicious IPs, domains, and patterns.
Q&A Highlights
Answers cover DMP storage (HDFS, Hive, HBase, Redis, MySQL), hot‑swap implementation of RTB algorithms (dlopen, inotify), online vs. offline cookie mapping, and user identification methods across PC (cookies, Flash, evercookie) and mobile (Android ID, IMEI, IDFA) platforms.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.