Can Tiny Networks Beat Giant LLMs? Inside the Tiny Recursive Model (TRM) Breakthrough
A recent study from Samsung's SAIL Montreal lab shows that a 7‑million‑parameter, two‑layer Tiny Recursive Model can surpass large language models on challenging reasoning benchmarks by using recursive self‑correction instead of attention, offering a new efficient path for AI inference.
Background
Researchers at Samsung SAIL Montreal introduced a new recursive reasoning architecture called the Tiny Recursive Model (TRM) in the paper "Less is More: Recursive Reasoning with Tiny Networks". The work investigates whether small networks can achieve strong reasoning performance compared to much larger language models.
Model Architecture
TRM uses only 7 M parameters organized in a two‑layer neural network (MLP) and eliminates the self‑attention layer in its TRM‑MLP variant, while the TRM‑Att variant retains attention. The authors argue that for fixed‑size inputs, MLPs reduce over‑fitting and that attention can be wasteful when context length is short.
The core mechanism repeatedly updates an answer variable y and a latent thinking variable z in a recursive loop, allowing the model to self‑correct its reasoning across multiple steps.
Comparison with Prior Work
The design builds on the earlier Hierarchical Reasoning Model (HRM), which uses two networks at different frequency levels and relies on fixed‑point theory. TRM discards these theoretical assumptions, simplifying training.
Experiments show that TRM outperforms HRM while using about 74 % fewer parameters and half the forward passes per inference step.
Experimental Results
On the ARC‑AGI benchmark, TRM achieves 45 % accuracy on ARC‑AGI‑1 and 8 % on ARC‑AGI‑2, surpassing many large models such as Gemini 2.5 Pro and DeepSeek R1.
On Sudoku‑Extreme, a 5 M‑parameter TRM reaches 87.4 % accuracy, setting a new record. On Maze‑Hard, a 7 M‑parameter version attains 85.3 % accuracy, 10 points higher than HRM.
Training speed improves dramatically with little loss in accuracy, and the model eliminates the second forward pass of Adaptive Computational Time (ACT), using a simple binary stop‑criterion instead.
Insights and Implications
The authors conclude that depth achieved through recursion can replace model scaling, challenging the prevailing belief that larger models are inherently stronger. Two‑layer networks demonstrate better generalization than deeper ones, which tend to over‑fit on small datasets.
By introducing exponential moving average (EMA) for stable training, TRM maintains consistent convergence on limited data, suggesting a viable “lightweight AI reasoning” route for edge devices and low‑resource scenarios.
Overall, the study proposes that small models, when equipped with recursive self‑learning, can exhibit complex reasoning behavior without the computational burden of massive architectures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Data Party THU
Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
