Tagged articles
3 articles
Page 1 of 1
AI Algorithm Path
AI Algorithm Path
May 22, 2025 · Artificial Intelligence

Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates

This tutorial explains how Monte Carlo methods are enhanced in reinforcement learning through epsilon‑greedy and epsilon‑soft policies, Monte Carlo control, a Blackjack Q‑function example, the distinction between on‑policy and off‑policy learning, importance sampling, and efficient incremental update techniques.

Epsilon-GreedyImportance SamplingMonte Carlo
0 likes · 14 min read
Monte Carlo Policy Improvement in RL: Epsilon‑Greedy, On‑Policy vs Off‑Policy, and Incremental Updates
Didi Tech
Didi Tech
May 19, 2021 · Artificial Intelligence

Applying Epsilon‑Greedy Bandit Algorithm for Content Delivery Optimization at DiDi

DiDi applied the epsilon‑greedy bandit algorithm integrated with its CMS to optimize ad placement across 600 slots, using quality scores, traffic sampling, and a drag‑and‑drop UI, which boosted CTR from 1.35% to 13.43% and unique visitors by 686%, demonstrating data‑driven growth beyond simple A/B testing.

Content OptimizationData-drivenEpsilon-Greedy
0 likes · 10 min read
Applying Epsilon‑Greedy Bandit Algorithm for Content Delivery Optimization at DiDi
HomeTech
HomeTech
Jun 10, 2020 · Artificial Intelligence

Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications

This article introduces recommender systems and the exploitation‑exploration dilemma, explains common E&E algorithms such as ε‑greedy, Upper‑Confidence‑Bound, and Thompson Sampling, and details their practical deployment for interest‑point eviction, selection, and adaptive recall count optimization in an automotive recommendation platform.

Bandit AlgorithmsEpsilon-GreedyExploitation
0 likes · 10 min read
Exploitation & Exploration Algorithms in Recommender Systems: ε‑Greedy, UCB, and Thompson Sampling Applications