Baobao Algorithm Notes
Baobao Algorithm Notes
Nov 14, 2024 · Artificial Intelligence

How OpenCoder’s RefineCode Dataset Powers Next‑Gen Code LLMs

The OpenCoder technical report details the creation of the RefineCode dataset, its multi‑stage preprocessing, filtering, and sampling pipelines, the pre‑training and fine‑tuning schedules for 1.5B and 8B models, and the autonomous data selection methods that together achieve performance comparable to Qwen2.5‑Coder.

Artificial IntelligenceAutoDSInstruction Tuning
0 likes · 18 min read
How OpenCoder’s RefineCode Dataset Powers Next‑Gen Code LLMs
DataFunSummit
DataFunSummit
Sep 13, 2023 · Artificial Intelligence

Data Engineering, Automated Evaluation, and Knowledge Graph Integration in Large Model Development

This article presents a comprehensive overview of data engineering practices for large model training, reviews current model scales and pre‑training data sources, discusses automated evaluation techniques, and explores how knowledge graphs can be integrated throughout the model lifecycle to improve quality and applicability.

AIautomated evaluationdata engineering
0 likes · 29 min read
Data Engineering, Automated Evaluation, and Knowledge Graph Integration in Large Model Development
DataFunTalk
DataFunTalk
Aug 16, 2023 · Artificial Intelligence

Data Engineering, Automated Evaluation, and Knowledge Graph Integration in Large Model Development

This article presents a comprehensive overview of data engineering practices, pre‑training data composition, automated model evaluation techniques, and the synergistic use of knowledge graphs within large‑scale AI model research, highlighting pipelines, quality criteria, and practical case studies.

Knowledge Graphautomation evaluationdata engineering
0 likes · 29 min read
Data Engineering, Automated Evaluation, and Knowledge Graph Integration in Large Model Development