Tag

reward design

1 views collected around this technical thread.

Architect
Architect
Mar 9, 2025 · Artificial Intelligence

Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset

The author reports a series of reinforcement‑learning‑based fine‑tuning experiments on a 0.5‑billion‑parameter Qwen‑0.5VB instruct model using the KK dataset, detailing reward design adjustments, curriculum‑style data scaling, observed convergence issues, and hypotheses about why small models fail to develop long reasoning chains.

LLM fine-tuningcurriculum learningreinforcement learning
0 likes · 11 min read
Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset