Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 14, 2024 · Artificial Intelligence

How OpenCoder’s RefineCode Dataset Powers Next‑Gen Code LLMs

The OpenCoder technical report details the creation of the RefineCode dataset, its multi‑stage preprocessing, filtering, and sampling pipelines, the pre‑training and fine‑tuning schedules for 1.5B and 8B models, and the autonomous data selection methods that together achieve performance comparable to Qwen2.5‑Coder.

Artificial IntelligenceAutoDSInstruction Tuning
0 likes · 18 min read
How OpenCoder’s RefineCode Dataset Powers Next‑Gen Code LLMs