How GoS Gives Agents a Shared Belief State for True Multi-Agent Collaboration
The paper introduces Graph of States (GoS), a neural‑symbolic framework that equips multi‑agent systems with an explicit, maintainable belief state, enabling backtracking and drill‑down during long‑horizon abductive tasks such as medical diagnosis and distributed‑system fault analysis, and demonstrates superior Match and Relevant scores over existing baselines.
Recent advances in large language models have pushed performance on tasks like mathematics and code, but real‑world scenarios such as medical diagnosis and fault troubleshooting require multiple agents to cooperate continuously in uncertain, dynamic environments. Existing multi‑agent reasoning approaches either chain agents in a simple pipeline or assume all evidence is pre‑available, leading to four failure modes—evidence fabrication, context drift, backtrack failure, and premature stopping.
These failures stem from two structural defects: (1) hypotheses, evidence, and reasoning steps are mixed into unstructured natural‑language context without an explicit state representation; (2) there is no state‑control mechanism, so agents decide to backtrack, drill down, or terminate purely on free will.
The authors propose Graph of States (GoS), a neural‑symbolic framework that builds an explicit, maintainable, revertible, and convergent belief‑state space for abductive tasks. GoS consists of two layers. The upper cognitive layer maps a central agent and expert agents to real‑world roles (e.g., attending physician, radiology specialist, or AIOps operators). The lower symbolic layer maintains a causal graph and a state machine that together represent the belief state and guide the reasoning process.
In the symbolic layer, the causal graph records symptoms, evidence, hypotheses, and their support, refutation, and refinement relations. The state machine controls the current reasoning level, deciding whether to continue gathering evidence, drill down to finer‑grained hypotheses, or backtrack when conflicts arise.
GoS also introduces a reasoning‑focus mechanism. At each step the system selects the hypothesis with the highest confidence and concentrates investigation budget and computational resources on that branch, turning potentially divergent exploration into a guided search.
The reasoning process forms a dual closed‑loop: the symbolic layer identifies the focus and issues an investigation command to the cognitive layer; the cognitive layer invokes tools, collects evidence, and returns analysis results; the symbolic layer updates the causal graph, recalibrates hypothesis confidences, and triggers the next state transition. This loop ensures that multi‑agent collaboration is constrained and continuously driven by the most valuable hypothesis.
State transitions are governed by clear rules. Backtracking occurs when an ancestor hypothesis loses its top‑confidence status after re‑evaluation, causing the system to prune downstream branches. Drill‑down proceeds only when the current top hypothesis enjoys a sufficient confidence advantage and enough supporting evidence, preventing premature deepening.
To evaluate GoS, the authors conduct experiments on two high‑risk real‑world abductive tasks. In the medical diagnosis setting (based on the DiagnosisArena benchmark), agents start only with a chief complaint and basic exam, then dynamically request tests and evidence. GoS achieves 39.86% Match and 78.99% Relevant, outperforming all baselines with lower cost. In the distributed‑system fault‑diagnosis setting, 150 incidents from a production environment require agents to start from an initial alert, query logs, metrics, and shell output, and locate the root cause. GoS attains 70.67% Match (36.67 percentage points higher than the strongest baseline) and 88.00% Relevant, demonstrating superior fine‑grained root‑cause identification.
Comprehensive ablation studies show that removing any of the three core modules—reasoning focus, causal graph, or state machine—degrades performance, confirming that the gains arise from their coordinated interaction. Sensitivity analysis reveals predictable performance trends as the number of neural‑symbolic interaction rounds, retrieval budget, and state‑transition thresholds vary, indicating stability and controllability.
From a broader perspective, GoS advances long‑horizon reasoning and multi‑turn interaction by providing a general framework that can be combined with domain‑specific tools such as medical knowledge bases, retrieval‑augmented generation, or AIOps multimodal preprocessing. It offers a universal reasoning skeleton rather than a task‑specific agent.
Paper title: Graph of States: Solving Abductive Tasks with Large Language Models
Paper link: https://arxiv.org/pdf/2603.21250
Code repository: https://github.com/gaorch85/Graph-of-States
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
