From Chain‑of‑Thought to Graph‑of‑Thought: The Evolution of LLM Reasoning

This article examines how large language model reasoning has progressed from linear Chain‑of‑Thought prompting to parallel Tree‑of‑Thought and flexible Graph‑of‑Thought approaches, highlighting each method’s mechanism, strengths, limitations, computational costs, and the broader shift toward cognitive‑centric AI research.

Data Party THU
Data Party THU
Data Party THU
From Chain‑of‑Thought to Graph‑of‑Thought: The Evolution of LLM Reasoning

Background

Large language models (LLMs) have shifted from scaling parameters to improving reasoning methods. Google introduced Chain‑of‑Thought (CoT) in 2022, prompting models to generate intermediate reasoning steps (“show their work”), which yielded large gains on arithmetic, commonsense, and symbolic tasks. Subsequent work extended the idea to Tree‑of‑Thought (ToT) and Graph‑of‑Thought (GoT), moving from linear chains to branching trees and general graphs.

Chain‑of‑Thought (CoT)

Core mechanism: prepend a cue such as “Let’s think step‑by‑step” to the prompt so the model produces a sequence of sub‑steps before the final answer. Typical workflow:

Identify relevant information in the problem statement.

Perform the required computation or inference for each sub‑step.

Combine sub‑step results into the final answer.

Advantages: inexpensive (no model fine‑tuning), leverages pretrained reasoning patterns, and consistently improves performance on benchmarks like GSM‑8K, ARC‑Easy, and Symbolic Reasoning.

Limitation: error propagation—if an early sub‑step is incorrect, later steps inherit the mistake, making CoT fragile for tasks that require backtracking or exploring alternative solution paths.

Reference: https://arxiv.org/abs/2201.11903

CoT illustration
CoT illustration

Tree‑of‑Thought (ToT)

Proposed by Shunyu Yao et al. (2023). ToT augments CoT with a search‑like process that maintains multiple reasoning branches simultaneously. At each node the model generates k candidate thoughts, scores them with a heuristic (e.g., a value model or external verifier), and expands the most promising branch. The process can be formalized as a depth‑first or best‑first search over a tree of thoughts.

Key steps:

Initialize the root with the original problem.

For the current node, sample k continuations (thoughts).

Evaluate each continuation using a scoring function.

Select the top‑scoring continuation and push it onto the search stack.

Repeat until a termination condition (e.g., answer confidence or depth limit) is met.

Empirical results on the 24‑point game and combinatorial puzzles show ToT outperforms CoT by exploring alternative operation orders and backtracking from dead‑ends.

Trade‑off: computational cost grows roughly linearly with the branching factor k and depth, requiring more GPU memory and inference time.

Reference: https://arxiv.org/abs/2305.10601

ToT illustration
ToT illustration

Graph‑of‑Thought (GoT)

GoT generalizes the tree structure to an arbitrary directed graph. Nodes represent partial reasoning states; edges encode logical dependencies, merges, splits, or cycles. This allows the model to:

Merge two independent branches into a composite idea.

Split a node into multiple specialized sub‑thoughts.

Introduce cycles for iterative refinement (e.g., re‑evaluating a hypothesis).

Such flexibility is useful for tasks that require multi‑source integration, knowledge‑graph reasoning, or iterative design.

Current challenges include graph construction (how to decide when to add nodes or edges), pruning strategies to keep the graph tractable, and defining robust scoring functions for cyclic structures. GoT remains an early‑stage research direction.

Reference: https://arxiv.org/abs/2308.09687

GoT illustration
GoT illustration

Method Comparison & Practical Guidance

Computational cost and inference flexibility follow the ordering: CoT < ToT < GoT In production settings:

CoT : lightweight, suitable for high‑throughput services where latency and cost dominate.

ToT : moderate cost, offers better robustness and solution quality; appropriate for planning, game playing, or complex math where backtracking is valuable.

GoT : highest cost, best for research prototypes, multi‑modal reasoning, or domains requiring explicit graph manipulation.

Hybrid pipelines are common: a simple query is answered with CoT; if confidence is low, the system escalates to ToT; for extremely ambiguous or open‑ended problems, GoT is invoked.

Comparison chart
Comparison chart
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

reasoningchain-of-thoughtAI researchGraph-of-ThoughtTree-of-Thought
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.