Can Tiny Networks Beat Giant LLMs? Inside the Tiny Recursive Model (TRM) Breakthrough

A recent study from Samsung's SAIL Montreal lab shows that a 7‑million‑parameter, two‑layer Tiny Recursive Model can surpass large language models on challenging reasoning benchmarks by using recursive self‑correction instead of attention, offering a new efficient path for AI inference.

Data Party THU
Data Party THU
Data Party THU
Can Tiny Networks Beat Giant LLMs? Inside the Tiny Recursive Model (TRM) Breakthrough

Background

Researchers at Samsung SAIL Montreal introduced a new recursive reasoning architecture called the Tiny Recursive Model (TRM) in the paper "Less is More: Recursive Reasoning with Tiny Networks". The work investigates whether small networks can achieve strong reasoning performance compared to much larger language models.

Model Architecture

TRM uses only 7 M parameters organized in a two‑layer neural network (MLP) and eliminates the self‑attention layer in its TRM‑MLP variant, while the TRM‑Att variant retains attention. The authors argue that for fixed‑size inputs, MLPs reduce over‑fitting and that attention can be wasteful when context length is short.

The core mechanism repeatedly updates an answer variable y and a latent thinking variable z in a recursive loop, allowing the model to self‑correct its reasoning across multiple steps.

TRM architecture diagram
TRM architecture diagram

Comparison with Prior Work

The design builds on the earlier Hierarchical Reasoning Model (HRM), which uses two networks at different frequency levels and relies on fixed‑point theory. TRM discards these theoretical assumptions, simplifying training.

Experiments show that TRM outperforms HRM while using about 74 % fewer parameters and half the forward passes per inference step.

Experimental Results

On the ARC‑AGI benchmark, TRM achieves 45 % accuracy on ARC‑AGI‑1 and 8 % on ARC‑AGI‑2, surpassing many large models such as Gemini 2.5 Pro and DeepSeek R1.

On Sudoku‑Extreme, a 5 M‑parameter TRM reaches 87.4 % accuracy, setting a new record. On Maze‑Hard, a 7 M‑parameter version attains 85.3 % accuracy, 10 points higher than HRM.

Training speed improves dramatically with little loss in accuracy, and the model eliminates the second forward pass of Adaptive Computational Time (ACT), using a simple binary stop‑criterion instead.

Performance charts
Performance charts

Insights and Implications

The authors conclude that depth achieved through recursion can replace model scaling, challenging the prevailing belief that larger models are inherently stronger. Two‑layer networks demonstrate better generalization than deeper ones, which tend to over‑fit on small datasets.

By introducing exponential moving average (EMA) for stable training, TRM maintains consistent convergence on limited data, suggesting a viable “lightweight AI reasoning” route for edge devices and low‑resource scenarios.

Overall, the study proposes that small models, when equipped with recursive self‑learning, can exhibit complex reasoning behavior without the computational burden of massive architectures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MLPLLM comparisonefficient-airecursive-reasoningtiny-models
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.