Operations 6 min read

Online Monitoring Practices for DSP Advertising: Shifting Testing Right

The article explains how a DSP advertising team moved testing right by building a four‑layer online monitoring system—including interface, UI, revenue, and daily key‑metric monitoring—to quickly detect and resolve production incidents, reduce false alarms, and improve overall reliability.

360 Quality & Efficiency

Jun 27, 2018

Online Monitoring Practices for DSP Advertising: Shifting Testing Right

The concept of "testing left" and "testing right" is introduced, where traditional testing occurs after development, while testing left moves activities earlier (e.g., BDD, unit tests, CodeReview) and testing right pushes monitoring into production to capture real‑time user feedback.

A real incident is described: an ad‑service fault at 7:30 pm caused massive channel bans, but because the overall revenue alarm threshold (20%) was not triggered, the problem went unnoticed until the next day, resulting in a 16‑hour outage and significant loss.

Given the high‑stakes nature of DSP advertising—processing billions of requests daily—effective online monitoring is essential. The team built a four‑layer monitoring framework:

Interface‑level monitoring : Using the existing Ialert system, each server’s APIs are monitored with business‑logic assertions; three consecutive assertion failures trigger SMS alerts.

UI‑level monitoring : PhantomJS is used to render ad pages, verify DOM structure, simulate clicks, and ensure correct landing‑page redirects, focusing on specific ad slots.

Revenue monitoring : The BA system provides 5‑minute revenue data; the system compares current revenue with weekly and daily baselines and sends alerts on abnormal drops.

Daily key‑metric monitoring : Critical metrics such as request count, bid success, clicks, CPM, CPC are fetched from the business monitoring platform and emailed with week‑over‑week and day‑over‑day comparisons.

Two concrete examples illustrate the impact: (1) a mis‑configured deployment caused server‑status alerts that were resolved within 15 minutes; (2) abnormal traffic on a Saturday exhausted resources, prompting rapid remediation thanks to alerts.

The article concludes that, while monitoring can generate false positives, the goal is to minimize them while maximizing sensitivity, thereby shortening detection time and significantly improving online reliability after shifting testing right.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Alerting Reliability online monitoring DSP advertising testing shift

Written by

360 Quality & Efficiency

360 Quality & Efficiency focuses on seamlessly integrating quality and efficiency in R&D, sharing 360’s internal best practices with industry peers to foster collaboration among Chinese enterprises and drive greater efficiency value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.