Ling‑2.5‑1T: Open‑Source 1‑Trillion‑Parameter Instant LLM with 1M‑Token Context

Ling‑2.5‑1T is an open‑source instant large language model with 1 trillion total parameters, 63 B active weights, and a 1 M token context window, featuring mixed‑linear attention, a composite correctness‑plus‑process reward for token efficiency, fine‑grained alignment, and leading benchmark performance across reasoning, instruction‑following, and agentic tasks.

AntTech
AntTech
AntTech
Ling‑2.5‑1T: Open‑Source 1‑Trillion‑Parameter Instant LLM with 1M‑Token Context

Overview

Ling‑2.5‑1T is an open‑source instant LLM with 1 trillion total parameters (63 B active) and support for up to 1 M token context, built on a mixed‑linear attention architecture.

Key Technical Improvements

Parameter scale and context length : 1 T total parameters, pre‑training data expanded from 20 T to 29 T, enabling a 1 M token context window.

Token efficiency : A composite “correctness + process redundancy” reward pushes inference efficiency to roughly four times the output‑token cost of previous models.

Fine‑grained preference alignment : Bidirectional RL feedback and agent‑based constraint verification markedly improve creative‑writing and instruction‑following tasks.

Agentic interaction : Trained with large‑scale high‑fidelity environments, compatible with Claude Code, OpenCode, OpenClaw, and achieving leading scores on the BFCL‑V4 benchmark.

Benchmark Results

The model was evaluated on knowledge, reasoning, agentic interaction, instruction following, and long‑text benchmarks. Compared with Ling‑1T and major instant models (DeepSeek V3.2, Kimi K2.5, GPT 5.2), Ling‑2.5‑1T shows clear advantages in complex reasoning and instruction‑following.

On the AIME 2026 high‑difficulty math benchmark, Ling‑2.5‑1T produces about 5 890 output tokens while approaching the performance of models that consume 15‑23 k tokens.

Architecture Details

The Ling 2.5 architecture extends Ling 2.0 by introducing a mixed‑linear attention mechanism (MLA + Lightning Linear) with a 1:7 ratio. Selected GQA layers are replaced by Lightning Linear Attention for higher throughput, while remaining layers are approximated as MLA with adapted QK‑Norm and Partial RoPE.

Activation parameters increase from 51 B to 63 B. Under the mixed‑linear design, inference speed surpasses Ling‑1T and even 32 B KIMI K2 models, especially for longer generation lengths.

Long‑Context Capability

Continual pre‑training on a 9 T high‑quality corpus expands world‑knowledge coverage. The context window is trained to 256 K tokens and extrapolated with YaRN to support up to 1 M tokens. Systematic long‑context benchmarks (RULER, MRCR) show Ling‑2.5‑1T outperforming MLA/DSA‑based instant models, though it still trails leading closed‑source APIs.

Limitations and Future Work

Ling‑2.5‑1T achieves high‑throughput decoding and leading long‑context handling, but remains behind state‑of‑the‑art models in complex agentic tasks. Future versions will focus on long‑range task execution, further token‑efficiency improvements, and a better balance between efficiency and effectiveness.

Resources

Model weights and code are available on Hugging Face https://huggingface.co/inclusionAI/Ling-2.5-1T and ModelScope https://modelscope.cn/models/inclusionAI/Ling-2.5-1T. Additional chat and API services will be released soon.

Large Language ModelbenchmarkToken Efficiencyagentic interactionmixed linear attention
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.