Tagged articles

MDP

9 articles · Page 1 of 1

Apr 10, 2026 · Artificial Intelligence

AdaGen: Enabling Adaptive, Data‑Driven Strategies for Image Generation Models

AdaGen replaces handcrafted static schedules in multi‑step image generators with a universal, learnable policy network trained via reinforcement learning, using an MDP formulation, adversarial rewards and action smoothing, achieving consistent quality and efficiency gains across diffusion, autoregressive, mask and flow models while adding negligible overhead.

MDPaction smoothingadaptive policy

0 likes · 11 min read

AdaGen: Enabling Adaptive, Data‑Driven Strategies for Image Generation Models

Model Perspective

Nov 1, 2025 · Fundamentals

Should You Quit Your Job? A Scientific Decision Model for Resignation

This article presents a comprehensive, mathematically grounded framework that quantifies personal utility, costs, benefits, risk, and behavioral biases to help professionals evaluate whether resigning from their current position is the rational choice.

MDPMonte Carlobehavioral economics

0 likes · 22 min read

Should You Quit Your Job? A Scientific Decision Model for Resignation

Data Party THU

Oct 22, 2025 · Artificial Intelligence

Demystifying Large‑Model Reinforcement Learning: From MDP Basics to Bellman and Advantage Functions

This article provides a comprehensive introduction to reinforcement learning for large language models, covering the Markov Decision Process formulation, the four core elements of RL, state‑value and action‑value functions, Bellman equations, and the advantage function that underpins modern policy‑gradient algorithms.

AI FundamentalsBellman equationMDP

0 likes · 13 min read

Demystifying Large‑Model Reinforcement Learning: From MDP Basics to Bellman and Advantage Functions

Alimama Tech

Sep 7, 2022 · Artificial Intelligence

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

The paper presents a Curriculum‑Guided Bayesian Reinforcement Learning (CBRL) framework that models ROI‑constrained real‑time bidding as a partially observable constrained MDP, using hard‑margin indicator rewards and a curriculum of relaxed proxy problems to achieve fast, constraint‑satisfying, Bayes‑optimal policies that outperform existing methods on large‑scale industrial data.

Bayesian RLMDPROI constraint

0 likes · 15 min read

Curriculum-Guided Bayesian Reinforcement Learning for ROI-Constrained Real-Time Bidding

DaTaobao Tech

Aug 18, 2022 · Artificial Intelligence

Introduction to Deep Reinforcement Learning: Theory, Algorithms, and Applications

This article introduces deep reinforcement learning by explaining its Markov decision process foundations, then categorizes the main algorithm families—value‑based methods like DQN, policy‑based approaches such as PG/DPG/DDPG, and actor‑critic techniques including A3C, PPO, and DDPG—detailing their architectures, training procedures, and key advantages.

DQNMDPactor-critic

0 likes · 14 min read

Introduction to Deep Reinforcement Learning: Theory, Algorithms, and Applications

GuanYuan Data Tech Team

Jul 28, 2022 · Artificial Intelligence

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

This article introduces reinforcement learning by defining agents, environments, rewards, and policies, explains key concepts such as Markov Decision Processes and Bellman equations, and surveys major algorithms—including dynamic programming, Monte‑Carlo, TD learning, policy gradients, Q‑learning, DQN, and evolution strategies—while highlighting practical challenges and notable case studies like AlphaGo Zero.

Deep LearningEvolution StrategiesMDP

0 likes · 27 min read

Unlocking Reinforcement Learning: Core Concepts, Algorithms, and Real‑World Applications

360 Quality & Efficiency

Feb 14, 2020 · Artificial Intelligence

Applying Reinforcement Learning to UI Traversal for Automated Testing

The article explores how reinforcement learning can be used to create a test robot that performs UI traversal, discussing the challenges of full automation, defining the MDP components, feature extraction methods, reward design, and suitable RL algorithms to improve testing coverage and efficiency.

MDPUI traversalautomated testing

0 likes · 8 min read

Applying Reinforcement Learning to UI Traversal for Automated Testing

Hulu Beijing

Dec 5, 2017 · Artificial Intelligence

What Is Reinforcement Learning? Core Concepts Explained

This article introduces the fundamental concepts of reinforcement learning, describing its origins, key components such as agents, environments, states, actions, and rewards, explaining the Markov decision process framework, and highlighting common algorithms like Q‑learning, policy gradients, and actor‑critic methods.

AIMDPalgorithms

0 likes · 4 min read

What Is Reinforcement Learning? Core Concepts Explained

Alibaba Cloud Developer

Feb 16, 2017 · Artificial Intelligence

How Reinforcement Learning Transforms E‑Commerce Search and Recommendation at Scale

This article explores how Alibaba's Taobao leverages reinforcement learning, Markov decision processes, and reward shaping to improve large‑scale product search ranking and recommendation, detailing problem modeling, algorithm designs such as Tabular Q‑learning and DDPG, experimental results, and advanced recommendation models like GBDT‑FTRL and Wide‑Deep.

Deep LearningMDPRecommendation Systems

0 likes · 21 min read