Alimama Tech
Dec 28, 2022 · Artificial Intelligence
Sustainable Online Reinforcement Learning for Auto-bidding (SORL)
The Sustainable Online Reinforcement Learning (SORL) framework tackles offline inconsistency in auto‑bidding by iteratively gathering safe online data from real ad systems with a Lipschitz‑based exploration method and training a variance‑suppressed conservative Q‑learning policy, achieving safer, more stable, and higher‑performing bids on Alibaba’s platform.
Variance Reductionauto-biddingoffline inconsistency
0 likes · 18 min read