Ling‑2.5‑1T: Open‑Source 1‑Trillion‑Parameter Instant LLM with 1M‑Token Context
Ling‑2.5‑1T is an open‑source instant large language model with 1 trillion total parameters, 63 B active weights, and a 1 M token context window, featuring mixed‑linear attention, a composite correctness‑plus‑process reward for token efficiency, fine‑grained alignment, and leading benchmark performance across reasoning, instruction‑following, and agentic tasks.
Overview
Ling‑2.5‑1T is an open‑source instant LLM with 1 trillion total parameters (63 B active) and support for up to 1 M token context, built on a mixed‑linear attention architecture.
Key Technical Improvements
Parameter scale and context length : 1 T total parameters, pre‑training data expanded from 20 T to 29 T, enabling a 1 M token context window.
Token efficiency : A composite “correctness + process redundancy” reward pushes inference efficiency to roughly four times the output‑token cost of previous models.
Fine‑grained preference alignment : Bidirectional RL feedback and agent‑based constraint verification markedly improve creative‑writing and instruction‑following tasks.
Agentic interaction : Trained with large‑scale high‑fidelity environments, compatible with Claude Code, OpenCode, OpenClaw, and achieving leading scores on the BFCL‑V4 benchmark.
Benchmark Results
The model was evaluated on knowledge, reasoning, agentic interaction, instruction following, and long‑text benchmarks. Compared with Ling‑1T and major instant models (DeepSeek V3.2, Kimi K2.5, GPT 5.2), Ling‑2.5‑1T shows clear advantages in complex reasoning and instruction‑following.
On the AIME 2026 high‑difficulty math benchmark, Ling‑2.5‑1T produces about 5 890 output tokens while approaching the performance of models that consume 15‑23 k tokens.
Architecture Details
The Ling 2.5 architecture extends Ling 2.0 by introducing a mixed‑linear attention mechanism (MLA + Lightning Linear) with a 1:7 ratio. Selected GQA layers are replaced by Lightning Linear Attention for higher throughput, while remaining layers are approximated as MLA with adapted QK‑Norm and Partial RoPE.
Activation parameters increase from 51 B to 63 B. Under the mixed‑linear design, inference speed surpasses Ling‑1T and even 32 B KIMI K2 models, especially for longer generation lengths.
Long‑Context Capability
Continual pre‑training on a 9 T high‑quality corpus expands world‑knowledge coverage. The context window is trained to 256 K tokens and extrapolated with YaRN to support up to 1 M tokens. Systematic long‑context benchmarks (RULER, MRCR) show Ling‑2.5‑1T outperforming MLA/DSA‑based instant models, though it still trails leading closed‑source APIs.
Limitations and Future Work
Ling‑2.5‑1T achieves high‑throughput decoding and leading long‑context handling, but remains behind state‑of‑the‑art models in complex agentic tasks. Future versions will focus on long‑range task execution, further token‑efficiency improvements, and a better balance between efficiency and effectiveness.
Resources
Model weights and code are available on Hugging Face https://huggingface.co/inclusionAI/Ling-2.5-1T and ModelScope https://modelscope.cn/models/inclusionAI/Ling-2.5-1T. Additional chat and API services will be released soon.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
