Artificial Intelligence 15 min read

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

The paper presents a Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework that models ROI‑constrained real‑time bidding as a partially observable constrained MDP, using hard‑margin indicator rewards and a curriculum of relaxed proxy problems to achieve fast, constraint‑satisfying, Bayes‑optimal policies that outperform existing methods on large‑scale industrial data.

Alimama Tech
Alimama Tech
Alimama Tech
Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

Real‑Time Bidding (RTB) is a core mechanism for online advertising, where advertisers often need to satisfy a Return‑on‑Investment (ROI) constraint while maximizing revenue. Existing bidding strategies assume a stationary market and cannot balance the non‑monotonic ROI constraint with the optimization objective in dynamic, partially observable environments.

This work models the ROI‑constrained bidding problem as a Partially Observable Constrained Markov Decision Process (POCMDP) and introduces a hard‑margin treatment of the non‑monotonic constraint using indicator functions. A Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework is proposed, which combines curriculum learning to provide dense reward signals and Bayesian inference to adaptively estimate the hidden market state.

In the curriculum stage, a series of proxy problems with progressively relaxed constraints are constructed, yielding dense rewards that guide exploration. The Bayesian component treats the unobserved market dynamics as latent variables, approximated by variational Bayes, and integrates posterior sampling into the policy to achieve Bayes‑optimal exploration‑exploitation trade‑offs.

Extensive experiments on large‑scale industrial datasets demonstrate that CBRL outperforms prior dual, approximation, and hyper‑parameter‑based methods in terms of constraint satisfaction, learning efficiency (over five‑fold speedup), and generalization to out‑of‑distribution scenarios.

The proposed hard‑margin indicator reward and curriculum‑guided Bayesian learning provide a scalable solution for ROI‑constrained bidding in non‑stationary ad markets, while highlighting remaining challenges such as automated curriculum design and handling data feedback loops.

Reinforcement LearningMDPBayesian RLcurriculum learningreal-time biddingROI constraint
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.