RandOpt — 2 Technical Articles

Machine Learning Algorithms & Natural Language Processing

Mar 17, 2026 · Artificial Intelligence

MIT Study Shows Adding Noise to Large Models Can Replace GRPO/PPO Tuning

A new MIT paper reveals that pretrained large models already contain many hidden expert submodels, and that a simple one‑step Gaussian perturbation (RandOpt) can locate and ensemble these experts to achieve performance comparable to or better than traditional GRPO/PPO tuning, especially as model size grows.

GRPOPPORandOpt

0 likes · 9 min read

MIT Study Shows Adding Noise to Large Models Can Replace GRPO/PPO Tuning

Machine Learning Algorithms & Natural Language Processing

Mar 15, 2026 · Artificial Intelligence

Is RL Dead in LLM Post-Training? MIT’s RandOpt Challenges Traditional Methods

The MIT‑CSAIL paper introduces RandOpt, a single‑step, gradient‑free, fully parallel post‑training algorithm that adds Gaussian noise to pretrained LLM weights and ensembles the results, achieving or surpassing PPO/GRPO performance by exploiting dense "neural thickets" that emerge as model scale grows.

LLMRandOptScaling Law

0 likes · 12 min read

Is RL Dead in LLM Post-Training? MIT’s RandOpt Challenges Traditional Methods