Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

The paper presents a Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework that models ROI‑constrained real‑time bidding as a partially observable constrained MDP, using hard‑margin indicator rewards and a curriculum of relaxed proxy problems to achieve fast, constraint‑satisfying, Bayes‑optimal policies that outperform existing methods on large‑scale industrial data.

Alimama Tech

Sep 7, 2022

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

Real‑Time Bidding (RTB) is a core mechanism for online advertising, where advertisers often need to satisfy a Return‑on‑Investment (ROI) constraint while maximizing revenue. Existing bidding strategies assume a stationary market and cannot balance the non‑monotonic ROI constraint with the optimization objective in dynamic, partially observable environments.

This work models the ROI‑constrained bidding problem as a Partially Observable Constrained Markov Decision Process (POCMDP) and introduces a hard‑margin treatment of the non‑monotonic constraint using indicator functions. A Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework is proposed, which combines curriculum learning to provide dense reward signals and Bayesian inference to adaptively estimate the hidden market state.

In the curriculum stage, a series of proxy problems with progressively relaxed constraints are constructed, yielding dense rewards that guide exploration. The Bayesian component treats the unobserved market dynamics as latent variables, approximated by variational Bayes, and integrates posterior sampling into the policy to achieve Bayes‑optimal exploration‑exploitation trade‑offs.

Extensive experiments on large‑scale industrial datasets demonstrate that CBRL outperforms prior dual, approximation, and hyper‑parameter‑based methods in terms of constraint satisfaction, learning efficiency (over five‑fold speedup), and generalization to out‑of‑distribution scenarios.

The proposed hard‑margin indicator reward and curriculum‑guided Bayesian learning provide a scalable solution for ROI‑constrained bidding in non‑stationary ad markets, while highlighting remaining challenges such as automated curriculum design and handling data feedback loops.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Reinforcement Learning MDP Bayesian RL curriculum learning real-time bidding ROI constraint

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.