Unified Solution to Constrained Bidding in Online Display Advertising (USCB)
The paper proposes a unified solution for real‑time bidding in online display ads that formulates advertiser budget and KPI limits as a constrained linear program, derives a closed‑form optimal bidding function with m+1 parameters, and uses model‑free reinforcement learning to dynamically adjust those parameters, achieving superior traffic‑value capture in large‑scale deployment on Alibaba’s Taobao platform.
This article addresses the classic problem of real‑time bidding (RTB) in online display advertising, where advertisers aim to maximize the value of acquired traffic under budget and KPI constraints. The authors formalize various advertiser demands as a constrained bidding problem and derive a universal optimal bidding formula. The formula contains m+1 core parameters (one for each constraint) and can be adjusted in real time.
Because the advertising environment fluctuates, the optimal parameters cannot be fixed offline. The paper proposes a reinforcement‑learning (RL) based parameter‑adjustment module that treats the adjustment process as a Markov Decision Process (MDP). Using a model‑free RL algorithm (DDPG) with a specially designed critic loss, the system dynamically tunes the parameters to approximate the optimal solution.
The constrained bidding problem is expressed as a linear program (LP1) that maximizes total traffic value subject to a budget constraint and multiple KPI constraints (both cost‑related and non‑cost‑related). When the full traffic set is known, the LP can be solved directly; otherwise, the derived optimal bidding formula reduces the problem to finding the optimal parameters.
Experimental results on Alibaba’s Taobao advertising platform compare three baseline control methods—fixed historical parameters (FB), model‑based PID (M‑PID), and a model‑free RL method for budget constraints (DRLB)—against the proposed USCB approach across three product types (budget‑constrained click, click‑CPC, conversion‑CPC). USCB consistently achieves the highest R/R* ratio, indicating superior traffic‑value capture.
The solution has been deployed in the live Taobao display‑ad system, serving millions of advertisers and generating substantial daily revenue. The deployment architecture separates model training and online inference, enabling scalable and continuous updates.
In conclusion, the paper presents a unified, theoretically grounded solution for constrained RTB, combines a closed‑form optimal bidding function with an RL‑based parameter tuner, and validates its effectiveness through extensive offline and online experiments.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.