Operations 6 min read

Online Monitoring Practices for DSP Advertising: Shifting Testing Right

The article explains how a DSP advertising team moved testing right by building a four‑layer online monitoring system—including interface, UI, revenue, and daily key‑metric monitoring—to quickly detect and resolve production incidents, reduce false alarms, and improve overall reliability.

360 Quality & Efficiency
360 Quality & Efficiency
360 Quality & Efficiency
Online Monitoring Practices for DSP Advertising: Shifting Testing Right

The concept of "testing left" and "testing right" is introduced, where traditional testing occurs after development, while testing left moves activities earlier (e.g., BDD, unit tests, CodeReview) and testing right pushes monitoring into production to capture real‑time user feedback.

A real incident is described: an ad‑service fault at 7:30 pm caused massive channel bans, but because the overall revenue alarm threshold (20%) was not triggered, the problem went unnoticed until the next day, resulting in a 16‑hour outage and significant loss.

Given the high‑stakes nature of DSP advertising—processing billions of requests daily—effective online monitoring is essential. The team built a four‑layer monitoring framework:

Interface‑level monitoring : Using the existing Ialert system, each server’s APIs are monitored with business‑logic assertions; three consecutive assertion failures trigger SMS alerts.

UI‑level monitoring : PhantomJS is used to render ad pages, verify DOM structure, simulate clicks, and ensure correct landing‑page redirects, focusing on specific ad slots.

Revenue monitoring : The BA system provides 5‑minute revenue data; the system compares current revenue with weekly and daily baselines and sends alerts on abnormal drops.

Daily key‑metric monitoring : Critical metrics such as request count, bid success, clicks, CPM, CPC are fetched from the business monitoring platform and emailed with week‑over‑week and day‑over‑day comparisons.

Two concrete examples illustrate the impact: (1) a mis‑configured deployment caused server‑status alerts that were resolved within 15 minutes; (2) abnormal traffic on a Saturday exhausted resources, prompting rapid remediation thanks to alerts.

The article concludes that, while monitoring can generate false positives, the goal is to minimize them while maximizing sensitivity, thereby shortening detection time and significantly improving online reliability after shifting testing right.

operationsalertingReliabilityonline monitoringDSP advertisingtesting shift
360 Quality & Efficiency
Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.