Machine Learning Algorithms & Natural Language Processing
Mar 17, 2026 · Artificial Intelligence
MIT Study Shows Adding Noise to Large Models Can Replace GRPO/PPO Tuning
A new MIT paper reveals that pretrained large models already contain many hidden expert submodels, and that a simple one‑step Gaussian perturbation (RandOpt) can locate and ensemble these experts to achieve performance comparable to or better than traditional GRPO/PPO tuning, especially as model size grows.
GRPOPPORandOpt
0 likes · 9 min read
