Tagged articles
2 articles
Page 1 of 1
Machine Heart
Machine Heart
May 14, 2026 · Artificial Intelligence

Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration

I²B‑LPO is an exploration‑enhancement framework for RLVR that branches rollouts at high‑entropy nodes, injects latent variables via pseudo self‑attention, and filters paths with an information‑bottleneck self‑reward, achieving up to 5.3% accuracy and 7.4% diversity improvements on multiple math reasoning benchmarks.

RLVRentropyexploration
0 likes · 14 min read
Breaking Homogeneous Reasoning: I²B‑LPO Guides RLVR from Repeated Sampling to Effective Exploration
Didi Tech
Didi Tech
Jun 12, 2023 · Artificial Intelligence

Laser: Latent Surrogate Representation Learning for Long-Term Effect Estimation in Ride-Hailing Markets

Laser (Latent Surrogate Representation learning) estimates long‑term ride‑hailing market effects by inferring hidden surrogate variables from short‑term outcomes using an iVAE and inverse‑probability weighting, thereby reducing experiment cost and latency while achieving more accurate causal effect predictions than existing baselines.

IPWRide HailingUplift Modeling
0 likes · 9 min read
Laser: Latent Surrogate Representation Learning for Long-Term Effect Estimation in Ride-Hailing Markets