Architect
Mar 9, 2025 · Artificial Intelligence
Experiments with Reinforcement Learning Fine‑Tuning of a 0.5B Qwen Model on the KK Dataset
The author reports a series of reinforcement‑learning‑based fine‑tuning experiments on a 0.5‑billion‑parameter Qwen‑0.5VB instruct model using the KK dataset, detailing reward design adjustments, curriculum‑style data scaling, observed convergence issues, and hypotheses about why small models fail to develop long reasoning chains.
LLM fine-tuningcurriculum learningreinforcement learning
0 likes · 11 min read