How CogTree’s Dual‑System Architecture Boosts Small LLM Reasoning
The paper introduces CogTree, a dual‑system cognitive‑tree generation model that combines an Intuition System and a Reflection System to iteratively decompose and verify hypotheses, dramatically improving the accuracy of small 7B language models on complex logical and mathematical reasoning tasks.
Background
With the rapid development of deep learning in NLP and machine translation, large language models such as GPT‑3.5 have achieved breakthroughs in text generation, sentiment analysis, and dialogue. However, they still struggle with complex logical and mathematical reasoning, lack cognitive abilities, and incur high inference costs, especially when using parameter‑free reasoning‑enhancement techniques.
Motivation
This work investigates lightweight large models (7B) for complex task reasoning by constructing a dual‑system generative reasoning tree, dramatically improving answer accuracy on difficult math and logic problems.
Algorithm Overview
CogTree adopts two systems—an Intuition System and a Reflection System—implemented with separate LLMs. The Intuition System generates multiple decomposition hypotheses for the original query, while the Reflection System validates these hypotheses, selects the most promising ones, and guides further generation, iteratively building a reasoning tree.
Intuition System
The Intuition System uses a decoder‑only model (e.g., GPT‑2‑XL or LLaMA‑7B) enhanced with in‑context examples. Given a query, it retrieves k similar examples from a decomposition set, computes cosine similarity of query representations, and generates candidate decompositions.
Reflection System
The Reflection System, sharing the same architecture, evaluates the outputs of the Intuition System. It produces a score for the current state and an overall score for the entire reasoning chain, ensuring the plausibility of each step.
Training
Both systems are fine‑tuned with supervised learning. The Intuition System maximizes the likelihood of generated text conditioned on context, while the Reflection System uses a binary loss to classify states as acceptable or not.
Evaluation
CogTree was tested on the Entailment Bank logical reasoning dataset and the GSM8K math dataset. Results show a significant increase in accuracy compared with baseline LLMs and other fine‑tuning methods.
Release
The CogTree code will be contributed to the EasyNLP framework ( https://github.com/alibaba/EasyNLP ) for the community.
References
Chengyu Wang, Minghui Qiu, Taolin Zhang, Tingting Liu, Lei Li, Jianing Wang, Ming Wang, Jun Huang, Wei Lin. EasyNLP: A Comprehensive and Easy‑to‑use Toolkit for Natural Language Processing. EMNLP 2022.
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems. CoRR, abs/2110.14168, 2021.
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Olivier Bousquet, Quoc Le, and Ed H. Chi. Least‑to‑most prompting enables complex reasoning in large language models. CoRR, abs/2205.10625, 2022.
Jonathan St B. T. Evans. Heuristic and analytic processes in reasoning. British Journal of Psychology, 75(4):451–468, 1984.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
