How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data
Shanghai AI Laboratory’s InternLM 3.0 upgrade demonstrates that a refined 4 TB token dataset can boost a large‑language model’s performance beyond that of open‑source peers trained on 18 TB, cutting training cost by over 75% while merging regular dialogue with deep reasoning capabilities.
Scaling‑law Regime and Data Efficiency
Large‑model performance follows a scaling‑law relationship where compute and data volume are primary drivers. Shanghai AI Laboratory introduces Intelligence Quality per Token (IQPT) as a metric that captures the ratio of average model performance to the amount of training data. Higher IQPT indicates that each token contributes more learning signal, implying that improving data quality can yield larger gains than merely increasing data size.
InternLM 3.0 (InternLM3‑8B‑Instruct)
Released on 15 January, InternLM 3.0 is trained on a refined corpus of only 4 TB of tokens . Using the data‑refinement pipeline, the model attains comparable effectiveness to open‑source models that consume roughly 18 TB , saving more than 75 % of training cost . IQPT measured on InternLM 3.0 is over four times higher than that of Llama 3.1, demonstrating a superior performance‑per‑token ratio.
Data‑Refinement Framework
Intelligent Data Processing : The raw corpus is partitioned into millions of domains. Autonomous agents perform large‑scale quality inspection, learn from error cases, and apply domain‑specific handling, enabling fine‑grained filtering without manual effort.
High‑Value Data Synthesis : A general model generates candidate data. Candidates are filtered through a tree‑search strategy and multi‑dimensional quality checks (e.g., logical consistency, factual correctness, diversity). The vetted data are then used to fine‑tune a specialist model.
Post‑Training Instruction Synthesis
A world‑knowledge‑tree‑driven pipeline creates tens of thousands of high‑quality instruction examples. Multi‑agent generation extracts real‑world user intents, classifies them into fine‑grained task categories, and synthesizes instruction‑response pairs. These examples are employed for further fine‑tuning, markedly improving conversational fluency.
Evaluation Methodology and Results
Using the open‑source OpenCompass benchmark suite, InternLM 3.0 was evaluated on more than ten authoritative test sets, including CMMLU, GPQA, mathematics, coding, instruction following, long‑text handling, and dialogue. The evaluation protocol follows the reproducible procedures documented by OpenCompass.
Across the majority of benchmarks, InternLM 3.0 outperforms peer open‑source models of similar scale and approaches the performance of GPT‑4o‑mini . For example, on GPQA the model achieves a score 3.2 points higher than Llama 3.1, while on CMMLU it exceeds the baseline by 4.5% absolute accuracy.
General‑Specialist Integration (通专融合)
InternLM 3.0 adopts a “general‑special integration” architecture: a single model can switch between a regular dialogue mode and a deep‑thinking mode via a system prompt. This eliminates the need for separate specialist models. The earlier InternThinker model excelled at deep reasoning but lacked conversational fluency; InternLM 3.0 resolves this trade‑off by jointly training on both data streams and controlling mode selection at inference time.
Demo Scenarios
Solving an arrow‑maze path‑finding puzzle that requires spatial reasoning and algorithmic planning.
Playing classic number‑guessing games, demonstrating multi‑turn logical deduction.
Executing multi‑step web‑browsing tasks with more than 20 navigation steps, showcasing integrated browsing capability.
Release Artifacts
Model code, checkpoints, and training scripts are publicly available at: https://github.com/InternLM/InternLM HuggingFace repository: https://huggingface.co/internlm ModelScope repository:
https://www.modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instructArchitecture Overview
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AIWalker
Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
