Which Multi‑Agent Communication Protocol Wins? UIUC Introduces ProtocolBench at ICML 2026

The UIUC team presents ProtocolBench, a systematic benchmark that compares four multi‑agent communication protocols across four realistic scenarios, revealing distinct trade‑offs in latency, reliability, and security, and proposes ProtocolRouter to automatically select the most suitable protocol per workload.

Machine Heart
Machine Heart
Machine Heart
Which Multi‑Agent Communication Protocol Wins? UIUC Introduces ProtocolBench at ICML 2026

Background

Multi‑agent systems are transitioning from research prototypes to production deployments, where dozens of agents collaborate on planning, retrieval, tool use, and answer synthesis. As agent networks grow from local “face‑to‑face” connections to LAN or Internet‑scale deployments, the communication protocol becomes a critical factor for performance, reliability, and security.

Why protocols matter

Beyond basic data transfer, protocol design influences multi‑hop collaboration overhead, streaming latency, fault‑tolerance, authentication, end‑to‑end encryption, and metadata protection. Consequently, protocol selection is a system‑level design decision rather than an afterthought.

ProtocolBench design

ProtocolBench isolates the communication layer by fixing all non‑protocol variables—model, prompt, hardware, container image, workload, rate limit, and agent topology—so that only the protocol implementation changes. Four protocols are evaluated: A2A, ACP, ANP, and Agora. Four scenarios are used:

GAIA Document QA (planner‑driven multi‑hop task)

Safety Tech (security‑focused medical QA with TLS downgrade, replay attacks, etc.)

Streaming Queue (high‑throughput API service processing 1,000 MS‑MARCO records)

Fail‑Storm Recovery (fault‑recovery test on an 8‑agent ring network)

Each scenario measures metrics relevant to its workload, such as quality, success rate, average latency, total execution time, variance, and recovery ratio.

Protocol characteristics

A2A emphasizes structured agent‑to‑agent collaboration, suited for enterprise‑level orchestration and mission‑critical workloads. ACP follows a REST/async style, facilitating cross‑framework integration. ANP focuses on identity, secure routing, and end‑to‑end encryption, targeting privacy‑sensitive, cross‑boundary tasks. Agora adopts a decentralized, P2P workflow for dynamic, heterogeneous environments.

Benchmark results

A2A achieves the highest task utility in GAIA Document QA (Quality avg 2.51, Success avg 9.29) and excels in fault‑recovery (answer‑discovery retention ≈ 98.85%). Its lightweight HTTP + JSON‑RPC envelope and turn‑based collaboration match planner‑driven workflows.

ACP delivers the lowest average end‑to‑end latency in Streaming Queue (9.66 s) and the smallest total runtime (40.28 min). A2A is close (9.70 s, 40.45 min), while ANP (11.36 s) and Agora (13.14 s) incur higher latencies, showing that REST‑style protocols reduce request‑response overhead.

ANP / Agora dominate the Safety Tech scenario, covering all five security dimensions (TLS transport, session hijack protection, E2E encryption, tunnel sniffing resistance, metadata leakage prevention). A2A and ACP require additional security layers for these guarantees.

The overall conclusion is that no single protocol dominates all scenarios; each excels under specific constraints.

ProtocolRouter

ProtocolRouter is a constraint‑aware router that selects protocols based on hard constraints (e.g., mandatory encryption). When multiple protocols satisfy the constraints, performance priors from ProtocolBench break ties. Selections can be per‑scenario or per‑module, and heterogeneous protocol mixes are supported.

When endpoints use different protocols, a stateless adapter performs envelope and field‑level translation without altering business semantics, preserving the security guarantees of the underlying protocols.

ProtocolRouterBench

ProtocolRouterBench evaluates the router on 60 test scenarios (180 communication modules) across five difficulty levels. Two modes are compared:

Spec‑only: selection based solely on capability tables.

Spec+Perf: selection incorporates performance priors from ProtocolBench.

Spec‑only yields 53.5% scenario accuracy and 71.2% module accuracy; Spec+Perf improves these to 63.3% and 81.7% respectively, raising macro‑F1 from 0.721 to 0.824. The results demonstrate that performance signals are essential for distinguishing protocols in complex settings.

End‑to‑End validation

Embedding ProtocolRouter back into the original ProtocolBench scenarios shows that the router can surpass the best single‑protocol baseline on several metrics, e.g., reducing Fail‑Storm recovery time from 8.00 s to 6.55 s (‑18.1%) and raising GAIA success average from 9.29 to 9.90. The router does not dominate every metric; its value lies in constrained, per‑scenario or per‑module protocol composition.

Conclusion

ProtocolBench establishes a reproducible methodology for evaluating multi‑agent communication protocols, while ProtocolRouter demonstrates that automated, constraint‑driven protocol selection is a worthwhile engineering direction as LLM agents transition to production systems.

Paper: "Which LLM Multi‑Agent Protocol to Choose?" (arXiv:2510.17149) – https://arxiv.org/abs/2510.17149

Code repository: https://github.com/ulab-uiuc/AgentProtocols

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

benchmarkMulti-Agent Systemsperformance evaluationLLM agentscommunication protocolsProtocolBenchProtocolRouter
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.