Tagged articles
2 articles
Page 1 of 1
Data Party THU
Data Party THU
Aug 30, 2025 · Artificial Intelligence

Understanding Multi‑Armed Bandits: Balancing Exploration and Exploitation in Reinforcement Learning

Multi‑armed bandit models illustrate the core exploration‑exploitation dilemma in reinforcement learning, covering greedy, ε‑greedy, and optimistic‑initial‑value strategies, as well as sample‑average and incremental Q‑value estimation methods with practical examples and visual illustrations.

Q-value estimationReinforcement Learningexploration vs exploitation
0 likes · 15 min read
Understanding Multi‑Armed Bandits: Balancing Exploration and Exploitation in Reinforcement Learning
Model Perspective
Model Perspective
Jan 22, 2024 · Artificial Intelligence

How A/B Testing and the ε‑Greedy Multi‑Armed Bandit Can Boost Decisions

This article explains the principles of A/B testing and the ε‑greedy multi‑armed bandit algorithm, illustrates their practical use in e‑commerce recommendation optimization, and draws broader life lessons about balancing exploration and exploitation for better personal and professional decisions.

A/B testingRecommendation Systemsexploration vs exploitation
0 likes · 6 min read
How A/B Testing and the ε‑Greedy Multi‑Armed Bandit Can Boost Decisions