GuideBoot: A Guided Bootstrap Method for Solving Exploration‑Exploitation in Online Advertising

The article explains the exploration‑exploitation dilemma in recommendation systems, introduces the GuideBoot algorithm—an innovative guided bootstrap approach for contextual bandits—describes its Bayesian and non‑Bayesian foundations, presents experimental results on synthetic and real advertising data, and discusses an online learning extension.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
GuideBoot: A Guided Bootstrap Method for Solving Exploration‑Exploitation in Online Advertising

Using a slot‑machine analogy, the article introduces the exploration‑exploitation (E&E) problem: balancing the benefit of gathering more information by trying new options against the immediate reward of exploiting known good options, especially when resources are limited.

It explains that modern recommendation tasks, such as online advertising, face complex E&E challenges because new items constantly appear, requiring algorithms to decide between using existing strategies (exploitation) and testing new ones (exploration).

The article reviews two broad families of solutions: Bayesian methods, which model uncertainty but become intractable at large scale, and non‑Bayesian methods like resampling‑plus‑ensemble, which lack theoretical guarantees and struggle with cold‑start problems.

It then presents the GuideBoot algorithm, a guided bootstrap technique proposed by Tencent Advertising and the Chinese Academy of Sciences. GuideBoot generates a small set of randomly labeled pseudo‑samples based on model uncertainty, adds them to multiple bootstrapped models, and randomly selects a model at prediction time, thereby providing explicit guidance for exploration while keeping inference fast.

Guided pseudo‑samples are created by flipping labels of real data; higher uncertainty leads to a higher proportion of such samples, encouraging exploration, whereas low uncertainty reduces their proportion, favoring exploitation. This design approximates Bayesian reasoning but only incurs the uncertainty computation during training, making it suitable for high‑throughput, low‑latency ad serving.

Experimental evaluation shows that on synthetic data GuideBoot achieves the lowest average regret among compared methods, and on real Tencent advertising data it yields the highest average revenue with stable performance across repetitions.

An online variant, Online GuideBoot, adapts the method for continuous data streams by buffering incoming data, shuffling, and training models on these real‑time batches. Although its synthetic‑data performance is slightly below the offline version, it outperforms other baselines and demonstrates superior results on live advertising traffic.

The article concludes that GuideBoot effectively combines the strengths of Bayesian and ensemble approaches, can be extended to various domains beyond advertising, and highlights ongoing work to integrate it into large‑scale online learning pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

online advertisingExploration-Exploitationcontextual banditsGuideBoot
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.