Artificial Intelligence 9 min read

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

InternLM 3.0 (InternLM‑3) upgrades the Shusheng‑PuYu model by refining data to boost "thinking density", using only 4 TB of tokens to surpass peer open‑source models, cutting training cost by over 75% while merging ordinary dialogue with deep reasoning capabilities.

AIWalker

Jan 16, 2025

How InternLM 3.0 Achieves High Performance with Just 4 TB of Training Data

Background and Motivation

Under the scaling‑law regime, large language models face two bottlenecks: increasing compute requirements and the diminishing returns of merely expanding raw token counts, which now approach 20 T tokens for most open‑source models. Researchers at Shanghai AI Laboratory argue that improving data quality—measured as Intelligence Quality per Token (IQPT)—offers a larger performance gain than scaling data volume.

Key Innovation: Data‑Refinement Framework

The team defines IQPT as the ratio of average model performance to the amount of training data, providing a concrete metric for data “return on investment.” By applying this metric, InternLM 3.0 achieves an IQPT more than four times higher than leading open‑source models of comparable size.

The framework consists of two core components:

Intelligent data processing: the corpus is split into tens of millions of domains; an autonomous agent performs large‑scale quality inspection, learns from error cases, and applies domain‑specific refinements.

High‑value data synthesis: using a "general‑specialist fusion" approach, a general model iteratively generates synthetic data, which is then filtered and curated for a specialist model. The pipeline employs tree‑search strategies and multi‑dimensional quality checks to produce rich, reliable content.

Evaluation Methodology

Using the open‑source OpenCompass benchmark suite, the team evaluated InternLM 3.0 across more than ten authoritative test sets (e.g., CMMLU, GPQA) covering reasoning, mathematics, coding, instruction following, long‑text generation, dialogue, and overall performance. Results show InternLM 3.0 consistently outperforms same‑size open‑source baselines and approaches the performance of GPT‑4o‑mini.

Fusion of Deep Reasoning and Ordinary Dialogue

InternLM 3.0 is the first general‑purpose model to combine conventional conversational ability with deep‑thinking capacity. A system prompt toggles the model between the two modes, enabling a single model to handle both everyday chat and complex problem‑solving. The approach builds on the earlier InternThinker model, which excelled at long‑chain reasoning and self‑correction, surpassing the o1‑preview model on math competition benchmarks.

During fine‑tuning, the team also created a world‑knowledge‑tree‑driven instruction synthesis pipeline, generating hundreds of thousands of high‑quality instruction examples that dramatically improve dialogue experience.

Open‑Source Release and Ecosystem Integration

All model weights, training scripts, and evaluation pipelines are released on GitHub (https://github.com/InternLM/InternLM) and HuggingFace (https://huggingface.co/internlm). The model is compatible with ModelScope (https://www.modelscope.cn/models/Shanghai_AI_Laboratory/internlm3-8b-instruct) and can be deployed on various hardware platforms, including Ascend, Cambricon, and Muxi, thanks to collaborations with those vendors.

Demo Cases

Two illustrative demos showcase the model’s capabilities:

Arrow‑maze puzzle: InternLM 3.0 finds a viable path through a grid, demonstrating spatial reasoning that challenges even OpenAI’s o1 model.

Number‑guessing game: the model solves the classic guessing problem with ease, highlighting its logical deduction skills.

Additionally, the model can browse the web for multi‑step tasks, such as searching for second‑hand housing listings, performing over 20 sequential page interactions to produce a comprehensive recommendation.

Conclusion

By redefining data efficiency through the IQPT metric and coupling intelligent data processing with high‑value synthesis, InternLM 3.0 proves that substantial performance gains are achievable without massive token counts. The open‑source release invites the community to build on this paradigm, accelerating progress toward general artificial intelligence.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Model model evaluation Open-source AI scaling laws InternLM Data Efficiency

Written by

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.