Tagged articles

exploration

8 articles · Page 1 of 1

May 14, 2026 · Artificial Intelligence

Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration

I²B‑LPO is an exploration‑enhancement framework for RLVR that branches rollouts at high‑entropy nodes, injects latent variables via pseudo self‑attention, and filters paths with an information‑bottleneck self‑reward, achieving up to 5.3% accuracy and 7.4% diversity improvements on multiple math reasoning benchmarks.

RLVRentropyexploration

0 likes · 14 min read

Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration

AI Algorithm Path

May 21, 2025 · Artificial Intelligence

Understanding Monte Carlo Algorithms for Reinforcement Learning with a Blackjack Case Study

This article explains Monte Carlo methods for reinforcement learning, compares model‑free and model‑based approaches, details V‑ and Q‑function estimation using a Blackjack example, and discusses exploration‑exploitation trade‑offs and practical advantages of MC algorithms.

BlackjackModel-freeMonte Carlo

0 likes · 13 min read

Understanding Monte Carlo Algorithms for Reinforcement Learning with a Blackjack Case Study

Bitu Technology

May 18, 2022 · Artificial Intelligence

Mitigating Exposure Bias in Tubi’s Recommendation System

This article explains how Tubi’s machine‑learning team reduces exposure bias in its video recommendation pipeline by normalizing popularity features, incorporating additional signals such as search behavior, and applying exploration techniques like bandit algorithms to diversify content exposure.

banditsexplorationexposure bias

0 likes · 10 min read

Mitigating Exposure Bias in Tubi’s Recommendation System

HomeTech

Jun 10, 2020 · Artificial Intelligence

Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications

This article introduces recommender systems and the exploitation‑exploration dilemma, explains common E&E algorithms such as ε‑greedy, Upper‑Confidence‑Bound, and Thompson Sampling, and details their practical deployment for interest‑point eviction, selection, and adaptive recall count optimization in an automotive recommendation platform.

Bandit AlgorithmsEpsilon-GreedyExploitation

0 likes · 10 min read

Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications

Ctrip Technology

May 28, 2020 · Mobile Development

Intelligent Android Exploration Tool (IAET): UI‑Driven Automated Testing, Algorithms, Implementation, and Evaluation

This article presents IAET, an intelligent Android exploration tool that detects UI elements, applies graph‑based traversal algorithms with similarity optimizations, implements a bridge using UiAutomator and app_process, and demonstrates superior crash‑detection and activity‑coverage performance compared with the APE benchmark across major Chinese apps.

AndroidUI automationexploration

0 likes · 15 min read

Intelligent Android Exploration Tool (IAET): UI‑Driven Automated Testing, Algorithms, Implementation, and Evaluation

21CTO

Apr 24, 2020 · Artificial Intelligence

Why Your Recommendation System’s Offline Gains Fail Online: Common Pitfalls

This article examines the frequent pitfalls of recommendation systems—misleading metrics, over‑optimizing precision, data leakage, feature inconsistencies, and distribution bias—that cause offline AUC improvements to translate into lower online CTR and CPM, and offers practical mitigation strategies.

AIExploitationdata leakage

0 likes · 15 min read

Why Your Recommendation System’s Offline Gains Fail Online: Common Pitfalls

DataFunTalk

Mar 20, 2019 · Artificial Intelligence

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning

This article reviews the challenges of model‑free reinforcement learning, especially sparse reward issues exemplified by Montezuma’s Revenge, and surveys recent approaches such as expert demonstrations, curriculum learning, self‑play, hierarchical reinforcement learning, and count‑based exploration to mitigate these problems.

Model-freecurriculum-learningexploration

0 likes · 12 min read

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning

Qunar Tech Salon

May 16, 2016 · Artificial Intelligence

Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm

This article explains how a simple 20‑line multi‑armed bandit implementation can replace traditional A/B testing by continuously balancing exploration and exploitation to automatically discover the most effective UI variant, reducing manual analysis and improving conversion rates.

A/B testingExploitationexploration

0 likes · 8 min read

Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm