Tagged articles
6 articles
Page 1 of 1
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 31, 2025 · Artificial Intelligence

How Risk‑Sensitive Reinforcement Learning Improves LLM Pass@K Performance

This article analyzes why standard reinforcement learning can degrade Pass@K metrics after fine‑tuning large language models, introduces a risk‑sensitive RL objective that reshapes the advantage estimator, and demonstrates through bandit and mathematical‑reasoning experiments that the RS‑GRPO method consistently boosts diversity and overall Pass@K scores across multiple LLMs.

Exploration-ExploitationLLM fine-tuningPolicy Gradient
0 likes · 12 min read
How Risk‑Sensitive Reinforcement Learning Improves LLM Pass@K Performance
Model Perspective
Model Perspective
Jul 25, 2024 · Artificial Intelligence

How Harris Hawks Optimization Mimics Eagle Hunting to Solve Complex Problems

The Harris Hawks Optimization (HHO) algorithm, inspired by the coordinated hunting tactics of Harris hawks, models exploration and exploitation phases to tackle complex optimization challenges, illustrated through a traffic signal timing case study that demonstrates its effectiveness and limitations.

AI AlgorithmsExploration-ExploitationHarris Hawks Optimization
0 likes · 6 min read
How Harris Hawks Optimization Mimics Eagle Hunting to Solve Complex Problems
Zhuanzhuan Tech
Zhuanzhuan Tech
Oct 14, 2022 · Artificial Intelligence

Exploitation and Exploration in Recommendation Systems: Bias Types, Mitigation Strategies, and Diversity Optimization

The article explains how recommendation systems balance exploitation and exploration, details various bias sources such as selection, exposure, conformity, and position bias, presents mitigation techniques like feature input, bias towers, and greedy algorithms, and discusses diversity‑focused exploration using DPP methods.

DiversityExploration-Exploitationbias mitigation
0 likes · 7 min read
Exploitation and Exploration in Recommendation Systems: Bias Types, Mitigation Strategies, and Diversity Optimization
Alimama Tech
Alimama Tech
Aug 24, 2022 · Artificial Intelligence

Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction

The authors introduce AGE, an adversarial‑gradient‑driven exploration framework that injects uncertainty‑scaled perturbations into ad embeddings to approximate the downstream learning effect, combines Monte‑Carlo dropout uncertainty, a dynamic gating unit, and achieves up to 15 % offline gains and 6 % online CTR improvement over strong baselines.

Exploration-ExploitationOnline LearningRecommendation Systems
0 likes · 14 min read
Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction
Tencent Advertising Technology
Tencent Advertising Technology
Apr 12, 2021 · Artificial Intelligence

GuideBoot: A Guided Bootstrap Method for Solving Exploration‑Exploitation in Online Advertising

The article explains the exploration‑exploitation dilemma in recommendation systems, introduces the GuideBoot algorithm—an innovative guided bootstrap approach for contextual bandits—describes its Bayesian and non‑Bayesian foundations, presents experimental results on synthetic and real advertising data, and discusses an online learning extension.

Exploration-ExploitationGuideBootcontextual bandits
0 likes · 11 min read
GuideBoot: A Guided Bootstrap Method for Solving Exploration‑Exploitation in Online Advertising
DataFunTalk
DataFunTalk
Apr 24, 2020 · Artificial Intelligence

Common Pitfalls in Recommendation Systems: Metrics, Exploration‑Exploitation, and Offline‑Online Discrepancies

The article surveys typical challenges in recommendation systems, including ambiguous evaluation metrics, the trade‑off between precise algorithms and user experience, the exploration‑exploitation dilemma, and why offline AUC improvements often lead to online CTR/CPM drops due to data leakage, feature inconsistency, and distribution shifts.

AUCCTRExploration-Exploitation
0 likes · 14 min read
Common Pitfalls in Recommendation Systems: Metrics, Exploration‑Exploitation, and Offline‑Online Discrepancies