Automatic Hyperparameter Tuning in Tencent Recommendation System (TRS): Techniques, Evolution, and Practice
This article presents an in‑depth overview of Tencent's TRS automatic hyperparameter tuning, covering background, challenges, the evolution from Bayesian optimization to evolution strategies and reinforcement learning, a systematic platform solution, real‑world deployment results, and a Q&A session.
The talk introduces Tencent Recommendation System (TRS) and its automatic hyperparameter tuning component, which optimizes online parameters such as multi‑objective fusion weights, quota allocation, and diversity controls across recommendation, search, and advertising pipelines.
It first explains the concept of online hyperparameters, distinguishing them from offline model parameters, and provides three concrete examples: multi‑objective ranking fusion, quota distribution in recall‑to‑ranking funnels, and dynamic diversity constraints.
The technical challenges of online tuning are highlighted: high‑dimensional parameter space, costly feedback signal computation, complex data distribution, and real‑time noisy environments.
The evolution of optimization methods is then described:
Bayesian Optimization (BO): Uses a surrogate model to approximate the objective, iteratively selects exploration parameter groups, and updates the posterior with collected rewards, employing early‑stop strategies to handle limited traffic.
Evolution Strategies (ES): Generates a population of parameter samples, evaluates fitness in the live system, selects top performers, and updates the sampling distribution; variants such as OpenAI‑ES, NES, PEPG, and CMA‑ES are mentioned.
Reinforcement Learning (RL): Treats tuning as an agent‑environment interaction, where the agent outputs actions (hyperparameters) based on state features, and rewards are defined by business metrics; both online RL (e.g., TRPO, PPO, PPG) and offline RL (e.g., BCQ, CQL, IQL) are discussed, with a focus on BCQ's VAE‑based action generation to mitigate out‑of‑distribution errors.
A hybrid approach combining ES and RL is proposed to leverage ES's low‑cost exploration and RL's higher sample efficiency, by allocating half of the traffic to each method and selecting the best performing half for updates.
The systematic solution includes an SDK that business services integrate, a unified AutoTuning platform managing BO, ES, and RL components, and monitoring dashboards. This platform has been deployed in dozens of Tencent products, delivering significant commercial and user‑experience gains.
Two concrete application cases are highlighted: (1) multi‑objective fusion parameter tuning using ES‑PEPG, and (2) quota allocation in recall‑to‑ranking using OpenAI‑ES, both achieving notable improvements in key metrics.
The Q&A section addresses practical concerns such as required traffic volume for reliable training (≈10 k samples per agent) and reward design for balancing multiple business objectives.
References to original papers and articles on Bayesian optimization, ES variants, and RL algorithms are provided for further reading.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.