Why the Best SFT Checkpoint May Hurt RL Performance: Adaptive Early‑Stop Loss (AESL) for LLM Cold‑Start

The paper reveals that over‑optimizing supervised fine‑tuning (SFT) for large language models can diminish their reinforcement‑learning (RL) potential, proposes an Adaptive Early‑Stop Loss (AESL) that balances accuracy and output diversity during cold‑start, and demonstrates across multiple LLMs that AESL consistently yields superior RL results.

AI trainingAdaptive Early‑Stop LossLLM

0 likes · 11 min read

Why the Best SFT Checkpoint May Hurt RL Performance: Adaptive Early‑Stop Loss (AESL) for LLM Cold‑Start

DataFunTalk

Feb 21, 2021 · Artificial Intelligence

Intra‑Ensemble in Neural Networks

This paper proposes an intra‑ensemble strategy that trains multiple sub‑networks within a single neural network using random training operations, width‑depth variations, and parameter sharing, achieving diverse models and improved performance comparable to traditional ensembles while adding only marginal parameter overhead.

Architecture SearchModel DiversityParameter Sharing

0 likes · 9 min read