Meituan Technology Team
Feb 20, 2025 · Artificial Intelligence
Offline Multi-Agent Reinforcement Learning via In‑Sample Sequential Policy Optimization (InSPO)
Offline multi‑agent reinforcement learning (MARL) faces challenges such as out‑of‑distribution joint actions and local optima, and this article introduces the In‑Sample Sequential Policy Optimization (InSPO) algorithm—leveraging inverse KL regularization, maximum‑entropy, and cooperative Markov games—to achieve monotonic policy improvement and superior performance across benchmark tasks.
InSPOMaximum Entropycooperative Markov game
0 likes · 18 min read
