Artificial Intelligence 20 min read

Automatic Hyperparameter Tuning in Tencent Recommendation System (TRS): Techniques, Evolution, and Practice

This article presents an in‑depth overview of Tencent's TRS automatic hyperparameter tuning, covering background, challenges, the evolution from Bayesian optimization to evolution strategies and reinforcement learning, a systematic platform solution, real‑world deployment results, and a Q&A session.

DataFunSummit

Nov 21, 2023

Automatic Hyperparameter Tuning in Tencent Recommendation System (TRS): Techniques, Evolution, and Practice

The talk introduces Tencent Recommendation System (TRS) and its automatic hyperparameter tuning component, which optimizes online parameters such as multi‑objective fusion weights, quota allocation, and diversity controls across recommendation, search, and advertising pipelines.

It first explains the concept of online hyperparameters, distinguishing them from offline model parameters, and provides three concrete examples: multi‑objective ranking fusion, quota distribution in recall‑to‑ranking funnels, and dynamic diversity constraints.

The technical challenges of online tuning are highlighted: high‑dimensional parameter space, costly feedback signal computation, complex data distribution, and real‑time noisy environments.

The evolution of optimization methods is then described:

Bayesian Optimization (BO): Uses a surrogate model to approximate the objective, iteratively selects exploration parameter groups, and updates the posterior with collected rewards, employing early‑stop strategies to handle limited traffic.

Evolution Strategies (ES): Generates a population of parameter samples, evaluates fitness in the live system, selects top performers, and updates the sampling distribution; variants such as OpenAI‑ES, NES, PEPG, and CMA‑ES are mentioned.

Reinforcement Learning (RL): Treats tuning as an agent‑environment interaction, where the agent outputs actions (hyperparameters) based on state features, and rewards are defined by business metrics; both online RL (e.g., TRPO, PPO, PPG) and offline RL (e.g., BCQ, CQL, IQL) are discussed, with a focus on BCQ's VAE‑based action generation to mitigate out‑of‑distribution errors.

A hybrid approach combining ES and RL is proposed to leverage ES's low‑cost exploration and RL's higher sample efficiency, by allocating half of the traffic to each method and selecting the best performing half for updates.

The systematic solution includes an SDK that business services integrate, a unified AutoTuning platform managing BO, ES, and RL components, and monitoring dashboards. This platform has been deployed in dozens of Tencent products, delivering significant commercial and user‑experience gains.

Two concrete application cases are highlighted: (1) multi‑objective fusion parameter tuning using ES‑PEPG, and (2) quota allocation in recall‑to‑ranking using OpenAI‑ES, both achieving notable improvements in key metrics.

The Q&A section addresses practical concerns such as required traffic volume for reliable training (≈10 k samples per agent) and reward design for balancing multiple business objectives.

References to original papers and articles on Bayesian optimization, ES variants, and RL algorithms are provided for further reading.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation system reinforcement learning online learning Bayesian Optimization hyperparameter tuning Evolution Strategies

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.