Tag

offline inconsistency

0 views collected around this technical thread.

Alimama Tech
Alimama Tech
Dec 28, 2022 · Artificial Intelligence

Sustainable Online Reinforcement Learning for Auto-bidding (SORL)

The Sustainable Online Reinforcement Learning (SORL) framework tackles offline inconsistency in auto‑bidding by iteratively gathering safe online data from real ad systems with a Lipschitz‑based exploration method and training a variance‑suppressed conservative Q‑learning policy, achieving safer, more stable, and higher‑performing bids on Alibaba’s platform.

Variance Reductionauto-biddingoffline inconsistency
0 likes · 18 min read
Sustainable Online Reinforcement Learning for Auto-bidding (SORL)