Tagged articles
10 articles
Page 1 of 1
Data Party THU
Data Party THU
Aug 30, 2025 · Artificial Intelligence

Understanding Multi‑Armed Bandits: Balancing Exploration and Exploitation in Reinforcement Learning

Multi‑armed bandit models illustrate the core exploration‑exploitation dilemma in reinforcement learning, covering greedy, ε‑greedy, and optimistic‑initial‑value strategies, as well as sample‑average and incremental Q‑value estimation methods with practical examples and visual illustrations.

Q-value estimationexploration vs exploitationgreedy
0 likes · 15 min read
Understanding Multi‑Armed Bandits: Balancing Exploration and Exploitation in Reinforcement Learning
Huolala Tech
Huolala Tech
Jun 27, 2024 · Artificial Intelligence

How Adaptive Genetic Algorithms Revolutionize Freight Dynamic Pricing

This article presents a self‑adaptive genetic algorithm framework enhanced with multi‑armed bandit techniques to tackle pricing volatility in the freight industry, detailing its design, business challenges, experimental validation, and future integration with large language models for smarter dynamic pricing.

adaptive optimizationdynamic pricingfreight logistics
0 likes · 22 min read
How Adaptive Genetic Algorithms Revolutionize Freight Dynamic Pricing
Model Perspective
Model Perspective
Jan 22, 2024 · Artificial Intelligence

How A/B Testing and the ε‑Greedy Multi‑Armed Bandit Can Boost Decisions

This article explains the principles of A/B testing and the ε‑greedy multi‑armed bandit algorithm, illustrates their practical use in e‑commerce recommendation optimization, and draws broader life lessons about balancing exploration and exploitation for better personal and professional decisions.

A/B testingRecommendation Systemsexploration vs exploitation
0 likes · 6 min read
How A/B Testing and the ε‑Greedy Multi‑Armed Bandit Can Boost Decisions
58 Tech
58 Tech
Dec 28, 2021 · Artificial Intelligence

Reinforcement Learning for Cold‑Start Job Recommendation in 58.com

This talk explains how 58.com tackles the cold‑start and interest‑divergence problems of its massive blue‑collar job recruitment platform by modeling the recommendation process as a reinforcement‑learning task, detailing the use of multi‑armed bandit, contextual bandit, and linear‑UCB algorithms, offline evaluation pipelines, online deployment, and observed performance gains.

Contextual Banditcold startjob recommendation
0 likes · 25 min read
Reinforcement Learning for Cold‑Start Job Recommendation in 58.com
DataFunTalk
DataFunTalk
Dec 17, 2021 · Artificial Intelligence

Applying Reinforcement Learning to Solve Cold‑Start Problems in 58.com Job Recruitment

This talk explains how 58.com’s massive blue‑collar recruitment platform uses reinforcement‑learning techniques—including multi‑armed bandits, contextual MAB, and linear UCB—to address cold‑start and interest‑divergence challenges, describes the system architecture, offline evaluation, online deployment, and reports an 8% uplift in new‑user conversion.

Online Learningcold startcontextual MAB
0 likes · 26 min read
Applying Reinforcement Learning to Solve Cold‑Start Problems in 58.com Job Recruitment
DataFunTalk
DataFunTalk
Nov 12, 2020 · Artificial Intelligence

Reinforcement Learning for Recommendation System Mixing: Concepts, Practice, and Evaluation

This article explains how reinforcement learning, with its focus on maximizing long‑term reward, can improve recommendation system mixing by covering basic RL concepts, differences from supervised learning, multi‑armed bandit approaches, practical OpenAI Gym experiments, new AUC metrics, online gains, and advanced model optimizations.

OpenAI GymQ-LearningRecommendation Systems
0 likes · 10 min read
Reinforcement Learning for Recommendation System Mixing: Concepts, Practice, and Evaluation
DataFunTalk
DataFunTalk
Jan 7, 2020 · Artificial Intelligence

Personalized Poster Production and Distribution System for Video Recommendation

This article describes how iQIYI’s technical product team designed and implemented an AI‑driven personalized poster generation and distribution pipeline that automatically creates, ranks, and serves customized video posters, improving click‑through rates across TV and mobile platforms.

AIVideo platformcontent personalization
0 likes · 11 min read
Personalized Poster Production and Distribution System for Video Recommendation
iQIYI Technical Product Team
iQIYI Technical Product Team
Jan 3, 2020 · Industry Insights

How iQIYI Boosted Click‑Through Rates with AI‑Powered Personalized Poster Generation

This article examines iQIYI's end‑to‑end personalized poster production and distribution system, detailing AI‑driven image cropping, smart frame extraction, feature extraction, multi‑armed bandit ranking, and online experiments that together significantly increased poster click‑through rates on TV and mobile platforms.

AI poster generationVideo platformfeature extraction
0 likes · 12 min read
How iQIYI Boosted Click‑Through Rates with AI‑Powered Personalized Poster Generation
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 8, 2017 · Artificial Intelligence

How Private History Can Supercharge E‑commerce Recommendations: The PH‑MAB Mechanism Explained

This article introduces the PH‑MAB mechanism that combines public and private transaction histories to improve multi‑armed bandit‑based recommendation systems, explains its truthful mechanism‑design foundation, and shows how it reduces regret and boosts platform revenue compared to traditional epsilon‑greedy approaches.

Recommendation Systemse‑commercemechanism design
0 likes · 6 min read
How Private History Can Supercharge E‑commerce Recommendations: The PH‑MAB Mechanism Explained
Qunar Tech Salon
Qunar Tech Salon
May 16, 2016 · Artificial Intelligence

Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm

This article explains how a simple 20‑line multi‑armed bandit implementation can replace traditional A/B testing by continuously balancing exploration and exploitation to automatically discover the most effective UI variant, reducing manual analysis and improving conversion rates.

A/B testingExploitationexploration
0 likes · 8 min read
Improving A/B Testing with a 20‑Line Multi‑Armed Bandit Algorithm