Multi-Agent Communication: A Survey from MARL to Emergent Language and Large Language Models
This survey examines the evolution of multi‑agent communication—from early hand‑crafted protocols in MARL, through emergent discrete languages, to recent large‑language‑model‑driven approaches—using a unified "five W" framework to analyze who communicates, what, when, why, and how.
Introduction
Multi‑agent sequential decision making underlies autonomous driving, robotics, and collaborative AI assistants. In dynamic partially observable environments, communication reduces uncertainty and enables coordination. Reinforcement learning (RL) and deep RL (DRL) have been applied in simulated settings (Mnih et al., 2013; Lillicrap et al., 2015) and real‑world settings (Gu et al., 2016; Zhang et al., 2015; Qureshi et al., 2017; Meng et al., 2019). Most DRL algorithms assume fully observable MDPs, whereas multi‑agent systems are modeled as partially observable Markov decision processes (POMDPs) where each agent receives a local view. Communication mitigates partial observability by sharing observations, goals, intents, or policies (Chen et al., 2024b).
Evolution of Multi‑Agent Communication Learning
MARL communication : Early works introduced communication to improve coordination under partial observability, addressing “who should communicate” and “what to share” (Paulos et al., 2019; Lowe et al., 2017; Foerster et al., 2016a; Sukhbaatar et al., 2016; Jiang & Lu, 2018; Das et al., 2018; Rangwala & Williams, 2020). Many adopted the centralized training with decentralized execution (CTDE) paradigm, e.g., CommNet (Sukhbaatar et al., 2016) and IC3Net (Singh et al., 2018) aggregate hidden states to form shared representations. Later methods added flexible mechanisms such as attention‑based routing in TarMAC (Das et al., 2018) and graph‑structured propagation in DICG (Li et al., 2020), allowing dynamic partner selection. Continuous high‑dimensional messages are often opaque (Brown et al., 2020; LeCun et al., 2015) and assume unrestricted bandwidth, ignoring realistic discrete constraints (Foerster et al., 2016a; Lowe et al., 2017; Mordatch & Abbeel, 2018; Freed et al., 2020b;a). These limitations motivated research on emergent discrete communication.
Emergent Language (EL) : EL studies how agents develop structured, symbolic protocols (one‑hot, binary, etc.) through repeated interaction without predefined languages (Lazaridou & Baroni, 2020; Li et al., 2022). Two challenges of MARL‑based communication drive EL: (1) end‑to‑end learned protocols produce continuous, uninterpretable messages; (2) protocols are tightly coupled to specific environments and reward structures, making them brittle under changes in agents, tasks, or deployment conditions. EL therefore treats communication itself as a learning outcome, aiming for discrete, interpretable messages (Chaabouni et al., 2019; Kottur et al., 2017; Havrylov & Titov, 2017; Lazaridou et al., 2016; Lee et al., 2017; Eccles et al., 2019). Zero‑shot failures—agents trained separately cannot understand each other’s protocols—have been observed (Hu et al., 2020b). To mitigate this, environments and training regimes encouraging robust, generalizable, human‑aligned communication have been proposed (Bullard et al., 2021; 2020). Some works align emergent protocols with natural language (Lee et al., 2019; Lowe et al., 2020) or incorporate pretrained language models to inject linguistic priors (Lazaridou et al., 2020; Tucker et al., 2021).
Large Language Models (LLMs) : LLMs provide strong language understanding, reasoning, and world knowledge from massive pre‑training. Compared with MARL or EL agents that must learn protocols from scratch, LLM‑based agents can exchange information directly in natural language, infer shared goals, and adapt to new tasks and teammates with minimal additional training (Yang et al., 2025). This makes LLMs attractive for settings requiring flexible coordination, zero‑shot generalization, and human interaction.
Communication in LLMs : Recent architectures employ direct message passing, chain‑of‑thought (CoT) interactions, hierarchical structures, or graph‑based exchanges (Zhang et al., 2023c; Du et al., 2023; Qian et al., 2023; Hong et al., 2023; Holt et al.; Wu et al., 2023b; Jiang et al., 2023; Chan et al., 2023; Qian et al., 2024b; Zhuge et al., 2024b). These frameworks enable agents to collaborate, debate, plan, and reason across diverse environments while offering scalability across tasks. The key distinction is that EL investigates how communication emerges from scratch, whereas LLM‑driven communication builds on existing natural‑language capabilities; insights from EL help explain how collaborative pressures shape structured, task‑relevant communication, informing hybrid designs.
Five‑W Framework
The literature is organized by who communicates with whom, what is communicated, when communication occurs, why it is beneficial, and how it is motivated and operationalized. This structure reveals shared design principles and trade‑offs such as expressiveness versus bandwidth, interpretability versus performance, and task‑specific versus reusable protocols.
Open Challenges
Grounding emergent protocols in semantics and ensuring robustness under distribution shift.
Improving interpretability of continuous messages and bridging discrete emergent language with natural language.
Designing efficient communication under realistic bandwidth and latency constraints.
Providing theoretical guarantees (e.g., convergence, equilibrium properties) for learned communication in mixed‑motivation settings.
Developing benchmark suites that evaluate scalability, generalization, and human‑centric interaction.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
