Artificial Intelligence 9 min read

10M‑Parameter Model Solves ARC and Sudoku – Bengio Team Bets on Multi‑Trajectory Reasoning

A 10‑million‑parameter GRAM model from Bengio, KAIST, Mila and NYU achieves 97% accuracy on Sudoku‑Extreme and competitive scores on ARC‑AGI tasks by replacing deterministic recursive updates with a probabilistic multi‑trajectory process, and extensive ablations show that both random guidance and depth‑supervised training are essential for its performance.

Machine Learning Algorithms & Natural Language Processing

May 23, 2026

10M‑Parameter Model Solves ARC and Sudoku – Bengio Team Bets on Multi‑Trajectory Reasoning

Overview

The paper introduces Generative Recursive Reasoning (GRAM) , a model that transforms traditional deterministic recursive inference into a probabilistic multi‑trajectory process. Despite having only 10 M parameters, GRAM attains 97.0% accuracy on Sudoku‑Extreme, 52.0% on ARC‑AGI‑1 and 11.1% on ARC‑AGI‑2, surpassing deterministic recursive models such as HRM and TRM.

Model Architecture

GRAM decouples the hidden state into a high‑level component h and a low‑level component l. The low‑level state performs K deterministic updates while h remains fixed, then a Gaussian perturbation (mean guides reasoning direction, variance controls exploration) is added to h. This yields a probabilistic latent variable process that can be sampled in parallel to generate multiple candidate reasoning trajectories.

Training Procedure

During training, GRAM employs a truncated‑gradient deep‑supervision mechanism: after K deterministic low‑level updates, a surrogate objective is optimized. This deep supervision improves gradient flow but limits training efficiency, a limitation acknowledged by the authors.

Experimental Results

On benchmark tasks the model reports:

Sudoku‑Extreme: 97.0% accuracy (vs. 0% for large models DeepSeek‑R1, Claude 3.7 16k, o3‑mini‑high).

ARC‑AGI‑1: 52.0% accuracy; ARC‑AGI‑2: 11.1% accuracy.

N‑Queens: deterministic HRM/TRM achieve 80.70% / 72.90%; adding depth‑supervision and stochastic guidance (+DS+SG) reaches 100%; full GRAM reaches 99.69%.

Graph‑coloring (8‑node): conflicts reduced to 2.7 edges (vs. 3.3 for deterministic, 19.0 and 61.3 for autoregressive generators).

Unconditional Sudoku generation: 99.05% valid boards with 10.9 M parameters and 16 supervised steps, outperforming D3PM (55.1 M parameters, 1000 denoising steps, 91.33% validity).

Binary MNIST generation: increasing recursion steps from 8 to 256 lowers FID from 84.08 to 73.34 and improves IS.

Ablation Studies

Removing the guidance signal (mean set to zero) drops N‑Queens accuracy to 50.27%; removing stochasticity (variance set to zero) drops it to 0.0%, demonstrating that the benefit does not stem from random decoding or initialization alone but from variational training that makes stochastic trajectories learnable resources.

Depth‑supervision, hierarchical recursion, and stochastic guidance each contribute positively; their combined effect yields the best results.

Inference Extensions

GRAM supports width‑wise parallel sampling: with 16 iterations and N=20 parallel trajectories it reaches 97.0% on Sudoku, whereas a deterministic TRM needs 320 iterations to achieve only 90.5%. For multi‑solution tasks (e.g., N‑Queens) GRAM covers 90.3% of distinct valid solutions.

Data augmentation and inference‑time sampling are complementary: without augmentation, performance improves with more samples; with strong augmentation, marginal gains from additional samples diminish.

Limitations and Future Work

The evaluation is limited to controlled tasks (Sudoku, ARC‑AGI, N‑Queens, graph coloring, binary MNIST). The authors note that deep‑supervised sequential training hampers scalability to larger base models, indicating a need for more efficient training regimes.

Overall, GRAM demonstrates that converting deterministic recursive updates into a probabilistic multi‑trajectory process enhances exploration and constraint satisfaction in structured reasoning and multi‑solution generation tasks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

structured reasoning Sudoku ARC‑AGI GRAM Generative Recursive Reasoning N‑Queens probabilistic multi‑trajectory

Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.