Tagged articles
8 articles
Page 1 of 1
Machine Heart
Machine Heart
May 14, 2026 · Artificial Intelligence

Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration

I²B‑LPO is an exploration‑enhancement framework for RLVR that branches rollouts at high‑entropy nodes, injects latent variables via pseudo self‑attention, and filters paths with an information‑bottleneck self‑reward, achieving up to 5.3% accuracy and 7.4% diversity improvements on multiple math reasoning benchmarks.

RLVRReinforcement Learningentropy
0 likes · 14 min read
Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration
Bitu Technology
Bitu Technology
May 18, 2022 · Artificial Intelligence

Mitigating Exposure Bias in Tubi’s Recommendation System

This article explains how Tubi’s machine‑learning team reduces exposure bias in its video recommendation pipeline by normalizing popularity features, incorporating additional signals such as search behavior, and applying exploration techniques like bandit algorithms to diversify content exposure.

banditsexplorationexposure bias
0 likes · 10 min read
Mitigating Exposure Bias in Tubi’s Recommendation System
HomeTech
HomeTech
Jun 10, 2020 · Artificial Intelligence

Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications

This article introduces recommender systems and the exploitation‑exploration dilemma, explains common E&E algorithms such as ε‑greedy, Upper‑Confidence‑Bound, and Thompson Sampling, and details their practical deployment for interest‑point eviction, selection, and adaptive recall count optimization in an automotive recommendation platform.

Bandit AlgorithmsEpsilon-GreedyExploitation
0 likes · 10 min read
Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications
Ctrip Technology
Ctrip Technology
May 28, 2020 · Mobile Development

Intelligent Android Exploration Tool (IAET): UI‑Driven Automated Testing, Algorithms, Implementation, and Evaluation

This article presents IAET, an intelligent Android exploration tool that detects UI elements, applies graph‑based traversal algorithms with similarity optimizations, implements a bridge using UiAutomator and app_process, and demonstrates superior crash‑detection and activity‑coverage performance compared with the APE benchmark across major Chinese apps.

AndroidUI automationexploration
0 likes · 15 min read
Intelligent Android Exploration Tool (IAET): UI‑Driven Automated Testing, Algorithms, Implementation, and Evaluation
21CTO
21CTO
Apr 24, 2020 · Artificial Intelligence

Why Your Recommendation System’s Offline Gains Fail Online: Common Pitfalls

This article examines the frequent pitfalls of recommendation systems—misleading metrics, over‑optimizing precision, data leakage, feature inconsistencies, and distribution bias—that cause offline AUC improvements to translate into lower online CTR and CPM, and offers practical mitigation strategies.

AIExploitationMetrics
0 likes · 15 min read
Why Your Recommendation System’s Offline Gains Fail Online: Common Pitfalls
DataFunTalk
DataFunTalk
Mar 20, 2019 · Artificial Intelligence

Addressing Sparse Reward Problems in Model-Free Reinforcement Learning

This article reviews the challenges of model‑free reinforcement learning, especially sparse reward issues exemplified by Montezuma’s Revenge, and surveys recent approaches such as expert demonstrations, curriculum learning, self‑play, hierarchical reinforcement learning, and count‑based exploration to mitigate these problems.

Model-freecurriculum learningexploration
0 likes · 12 min read
Addressing Sparse Reward Problems in Model-Free Reinforcement Learning
Qunar Tech Salon
Qunar Tech Salon
May 16, 2016 · Artificial Intelligence

Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm

This article explains how a simple 20‑line multi‑armed bandit implementation can replace traditional A/B testing by continuously balancing exploration and exploitation to automatically discover the most effective UI variant, reducing manual analysis and improving conversion rates.

A/B testingExploitationexploration
0 likes · 8 min read
Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm