Baobao Algorithm Notes
Nov 14, 2024 · Artificial Intelligence
How OpenCoder’s RefineCode Dataset Powers Next‑Gen Code LLMs
The OpenCoder technical report details the creation of the RefineCode dataset, its multi‑stage preprocessing, filtering, and sampling pipelines, the pre‑training and fine‑tuning schedules for 1.5B and 8B models, and the autonomous data selection methods that together achieve performance comparable to Qwen2.5‑Coder.
Artificial IntelligenceAutoDSInstruction Tuning
0 likes · 18 min read
