Artificial Intelligence 11 min read

GuideBoot: A Guided Bootstrap Method for Solving Exploration‑Exploitation in Online Advertising

The article explains the exploration‑exploitation dilemma in recommendation systems, introduces the GuideBoot algorithm—an innovative guided bootstrap approach for contextual bandits—describes its Bayesian and non‑Bayesian foundations, presents experimental results on synthetic and real advertising data, and discusses an online learning extension.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
GuideBoot: A Guided Bootstrap Method for Solving Exploration‑Exploitation in Online Advertising

Using a slot‑machine analogy, the article introduces the exploration‑exploitation (E&E) problem: balancing the benefit of gathering more information by trying new options against the immediate reward of exploiting known good options, especially when resources are limited.

It explains that modern recommendation tasks, such as online advertising, face complex E&E challenges because new items constantly appear, requiring algorithms to decide between using existing strategies (exploitation) and testing new ones (exploration).

The article reviews two broad families of solutions: Bayesian methods, which model uncertainty but become intractable at large scale, and non‑Bayesian methods like resampling‑plus‑ensemble, which lack theoretical guarantees and struggle with cold‑start problems.

It then presents the GuideBoot algorithm, a guided bootstrap technique proposed by Tencent Advertising and the Chinese Academy of Sciences. GuideBoot generates a small set of randomly labeled pseudo‑samples based on model uncertainty, adds them to multiple bootstrapped models, and randomly selects a model at prediction time, thereby providing explicit guidance for exploration while keeping inference fast.

Guided pseudo‑samples are created by flipping labels of real data; higher uncertainty leads to a higher proportion of such samples, encouraging exploration, whereas low uncertainty reduces their proportion, favoring exploitation. This design approximates Bayesian reasoning but only incurs the uncertainty computation during training, making it suitable for high‑throughput, low‑latency ad serving.

Experimental evaluation shows that on synthetic data GuideBoot achieves the lowest average regret among compared methods, and on real Tencent advertising data it yields the highest average revenue with stable performance across repetitions.

An online variant, Online GuideBoot, adapts the method for continuous data streams by buffering incoming data, shuffling, and training models on these real‑time batches. Although its synthetic‑data performance is slightly below the offline version, it outperforms other baselines and demonstrates superior results on live advertising traffic.

The article concludes that GuideBoot effectively combines the strengths of Bayesian and ensemble approaches, can be extended to various domains beyond advertising, and highlights ongoing work to integrate it into large‑scale online learning pipelines.

recommendation systemsonline advertisingexploration-exploitationContextual BanditsGuideBoot
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.