Artificial Intelligence 7 min read

Can 13 Parameters Match Full‑Scale Fine‑Tuning? TinyLoRA’s RL Breakthrough

TinyLoRA, a Meta‑proposed method that fine‑tunes Qwen2.5‑7B with only 13 trainable parameters (26 bytes), achieves 91% accuracy on GSM8K under reinforcement learning, revealing that ultra‑low‑parameter RL can rival full‑scale supervised fine‑tuning.

PaperAgent

Feb 7, 2026

Can 13 Parameters Match Full‑Scale Fine‑Tuning? TinyLoRA’s RL Breakthrough

TinyLoRA Overview

Meta introduced TinyLoRA, a parameter‑efficient fine‑tuning method that adapts Qwen2.5‑7B with only 13 trainable parameters (26 bytes). In a reinforcement‑learning (RL) setting it reaches 91 % accuracy on the GSM8K math reasoning benchmark, outperforming supervised fine‑tuning (SFT) which requires orders of magnitude more parameters for comparable performance.

Background

LLMs typically acquire reasoning ability through RL after pre‑training. Conventional LoRA, even at rank = 1, still updates millions of parameters. The authors ask whether such a large parameter budget is necessary.

TinyLoRA Core Innovation

The method modifies LoRA by introducing a single trainable vector v ∈ ℝᵘ that is shared across all layers via weight tying. The adapted weight matrix is W' = W + U Σ ( Σ_i v_i P_i ) Vᵀ where U, Σ, V are obtained from a frozen truncated SVD of the original weight W, and each P_i ∈ ℝʳˣʳ is a fixed random matrix. Because the same v is used for every module, the total number of trainable parameters can be reduced to a single scalar in theory.

Empirical Results: RL vs. SFT Parameter Efficiency

GSM8K performance

13 parameters → 91 % accuracy (RL) vs. 83 % (SFT)

120 parameters → 95 % accuracy (RL) vs. 84 % (SFT)

1 000 parameters → near‑full‑fine‑tuning performance (RL) vs. >1 000 000 parameters required for SFT

Cross‑Model and Task Validation

Qwen2.5‑0.5B: ≈100 k parameters achieve 90 % accuracy.

Qwen2.5‑7B: ≈1 k parameters achieve the same level.

Fine‑tuning with 196 parameters yields an 87 % relative boost on high‑difficulty math benchmarks.

On the AIME‑24 set, 13 parameters increase accuracy from 3.3 % to 16.0 % (≈4.8× absolute gain).

Ablation Studies

FP32 precision outperforms BF16/FP16 in bit‑wise comparisons.

Tiled (depth‑wise) weight sharing surpasses structured (module‑type) sharing.

Full‑layer sharing with FP16 still reaches ≈70 % accuracy on GSM8K (baseline +10 %).

Training Dynamics

Updates as small as 16 parameters still receive a meaningful reward signal.

Larger parameter counts produce longer response sequences, indicating deeper reasoning.

KL divergence between training and inference models is negligible, confirming effective weight merging.

Model Architecture Comparison: Qwen vs. LLaMA

Qwen2.5 outperforms LLaMA‑3 in the ultra‑low‑parameter regime.

At equivalent performance, Qwen requires roughly one‑tenth of the parameters needed by LLaMA.

Even a single parameter can boost Qwen accuracy by ~5 % (to 82 %), while LLaMA shows almost no gain.

Why RL Works with So Few Parameters? (Information‑Theoretic View)

RL provides sparse, low‑entropy binary reward signals, yielding a high signal‑to‑noise ratio. SFT must absorb high‑entropy full answers, requiring much larger capacity. Consequently, RL can adjust model behavior with dramatically fewer parameters.

Core Insight: RL’s information density is 100‑1000× higher than SFT’s, enabling ultra‑low‑parameter fine‑tuning.

Reference

Original paper: https://arxiv.org/pdf/2602.04118 (Learning to Reason in 13 Parameters)

reinforcement learning parameter-efficient fine-tuning Qwen2.5 GSM8K TinyLoRA

Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.