Tagged articles

multi-armed bandit

10 articles · Page 1 of 1

Aug 30, 2025 · Artificial Intelligence

Understanding Multi‑Armed Bandits: Balancing Exploration and Exploitation in Reinforcement Learning

Multi‑armed bandit models illustrate the core exploration‑exploitation dilemma in reinforcement learning, covering greedy, ε‑greedy, and optimistic‑initial‑value strategies, as well as sample‑average and incremental Q‑value estimation methods with practical examples and visual illustrations.

Q-value estimationexploration vs exploitationgreedy

0 likes · 15 min read

Understanding Multi‑Armed Bandits: Balancing Exploration and Exploitation in Reinforcement Learning

Huolala Tech

Jun 27, 2024 · Artificial Intelligence

How Adaptive Genetic Algorithms Revolutionize Freight Dynamic Pricing

This article presents a self‑adaptive genetic algorithm framework enhanced with multi‑armed bandit techniques to tackle pricing volatility in the freight industry, detailing its design, business challenges, experimental validation, and future integration with large language models for smarter dynamic pricing.

adaptive optimizationdynamic pricingfreight logistics

0 likes · 22 min read

How Adaptive Genetic Algorithms Revolutionize Freight Dynamic Pricing

Model Perspective

Jan 22, 2024 · Artificial Intelligence

How A/B Testing and the ε‑Greedy Multi‑Armed Bandit Can Boost Decisions

This article explains the principles of A/B testing and the ε‑greedy multi‑armed bandit algorithm, illustrates their practical use in e‑commerce recommendation optimization, and draws broader life lessons about balancing exploration and exploitation for better personal and professional decisions.

A/B testingRecommendation Systemsexploration vs exploitation

0 likes · 6 min read

How A/B Testing and the ε‑Greedy Multi‑Armed Bandit Can Boost Decisions

58 Tech

Dec 28, 2021 · Artificial Intelligence

Reinforcement Learning for Cold‑Start Job Recommendation in 58.com

This talk explains how 58.com tackles the cold‑start and interest‑divergence problems of its massive blue‑collar job recruitment platform by modeling the recommendation process as a reinforcement‑learning task, detailing the use of multi‑armed bandit, contextual bandit, and linear‑UCB algorithms, offline evaluation pipelines, online deployment, and observed performance gains.

Contextual Banditcold-startjob recommendation

0 likes · 25 min read

Reinforcement Learning for Cold‑Start Job Recommendation in 58.com

DataFunTalk

Dec 17, 2021 · Artificial Intelligence

Applying Reinforcement Learning to Solve Cold‑Start Problems in 58.com Job Recruitment

This talk explains how 58.com’s massive blue‑collar recruitment platform uses reinforcement‑learning techniques—including multi‑armed bandits, contextual MAB, and linear UCB—to address cold‑start and interest‑divergence challenges, describes the system architecture, offline evaluation, online deployment, and reports an 8% uplift in new‑user conversion.

cold-startcontextual MABjob recruitment

0 likes · 26 min read

Applying Reinforcement Learning to Solve Cold‑Start Problems in 58.com Job Recruitment

DataFunTalk

Nov 12, 2020 · Artificial Intelligence

Reinforcement Learning for Recommendation System Mixing: Concepts, Practice, and Evaluation

This article explains how reinforcement learning, with its focus on maximizing long‑term reward, can improve recommendation system mixing by covering basic RL concepts, differences from supervised learning, multi‑armed bandit approaches, practical OpenAI Gym experiments, new AUC metrics, online gains, and advanced model optimizations.

Artificial IntelligenceOpenAI GymQ-Learning

0 likes · 10 min read

Reinforcement Learning for Recommendation System Mixing: Concepts, Practice, and Evaluation

DataFunTalk

Jan 7, 2020 · Artificial Intelligence

Personalized Poster Production and Distribution System for Video Recommendation

This article describes how iQIYI’s technical product team designed and implemented an AI‑driven personalized poster generation and distribution pipeline that automatically creates, ranks, and serves customized video posters, improving click‑through rates across TV and mobile platforms.

AIPoster GenerationVideo platform

0 likes · 11 min read

Personalized Poster Production and Distribution System for Video Recommendation

iQIYI Technical Product Team

Jan 3, 2020 · Industry Insights

How iQIYI Boosted Click‑Through Rates with AI‑Powered Personalized Poster Generation

This article examines iQIYI's end‑to‑end personalized poster production and distribution system, detailing AI‑driven image cropping, smart frame extraction, feature extraction, multi‑armed bandit ranking, and online experiments that together significantly increased poster click‑through rates on TV and mobile platforms.

AI poster generationVideo platformfeature extraction

0 likes · 12 min read

How iQIYI Boosted Click‑Through Rates with AI‑Powered Personalized Poster Generation

Alibaba Cloud Developer

Mar 8, 2017 · Artificial Intelligence

How Private History Can Supercharge E‑commerce Recommendations: The PH‑MAB Mechanism Explained

This article introduces the PH‑MAB mechanism that combines public and private transaction histories to improve multi‑armed bandit‑based recommendation systems, explains its truthful mechanism‑design foundation, and shows how it reduces regret and boosts platform revenue compared to traditional epsilon‑greedy approaches.

Recommendation Systemse-commercemechanism design

0 likes · 6 min read

How Private History Can Supercharge E‑commerce Recommendations: The PH‑MAB Mechanism Explained

Qunar Tech Salon

May 16, 2016 · Artificial Intelligence

Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm

This article explains how a simple 20‑line multi‑armed bandit implementation can replace traditional A/B testing by continuously balancing exploration and exploitation to automatically discover the most effective UI variant, reducing manual analysis and improving conversion rates.

A/B testingExploitationexploration

0 likes · 8 min read

Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm