Artificial Intelligence 11 min read

Ling-1T: The Trillion‑Parameter AI Model Redefining Efficient Reasoning

Ling-1T, a trillion‑parameter flagship non‑thinking model, combines 50 billion active parameters per token, 128 K context, Evo‑CoT reasoning, and FP8 mixed‑precision training to achieve state‑of‑the‑art performance on complex reasoning, code generation, and multimodal tasks while outlining its architecture, benchmarks, limitations, and future roadmap.

AntTech

Oct 9, 2025

Ling-1T: The Trillion‑Parameter AI Model Redefining Efficient Reasoning

Ling-1T is the first flagship model of the Ling 2.0 series, featuring a trillion (1T) total parameters and about 50 billion active parameters per token, a 128 K context window, and the evolutionary chain‑of‑thought (Evo‑CoT) training that greatly boosts efficient reasoning.

The model achieves SOTA results on a range of difficult reasoning benchmarks—including code generation, software development, competition mathematics, professional mathematics, and logical reasoning—outperforming both large open‑source models (e.g., DeepSeek‑V3.1‑Terminus, Kimi‑K2‑Instruct‑0905) and closed‑source APIs (gpt‑5‑main, Gemini‑2.5‑Pro).

Beyond pure reasoning, Ling‑1T excels in cross‑modal tasks such as visual‑frontend development, producing high‑quality code and designs for applications like a Three‑Body relationship map, a full‑stack Crane cloud platform, and an online tarot‑card prediction service.

The Ling 2.0 architecture is guided by the Ling Scaling Laws (https://arxiv.org/abs/2507.17702), which informed the choice of a 1/32 MoE activation ratio, MTP layers, and a zero‑mean aux‑loss‑free expert routing with sigmoid scoring, enabling stable training of a trillion‑parameter base.

Training employs FP8 mixed‑precision (the largest known FP8‑trained base model), achieving over 15% end‑to‑end speedup and 40% pipeline acceleration through heterogeneous fine‑grained pipeline scheduling, operator fusion, communication optimization, recomputation, checkpointing, and fine‑grained monitoring.

Mid‑training introduces high‑density reasoning data and a warmup‑stable‑merge (WSM) LR scheduler (https://arxiv.org/abs/2507.17634). Post‑training leverages Evo‑CoT to progressively activate reasoning ability while controlling inference cost, and a novel reinforcement‑learning approach called Linguistics‑Unit Policy Optimization (LPO) treats whole sentences as action units, improving training stability and generalization.

Comprehensive evaluations on knowledge, code, mathematics, reasoning, agent, and alignment benchmarks show Ling‑1T as the current best open‑source flagship non‑thinking model, especially strong on complex reasoning tasks.

Limitations include high inference cost due to the GQA‑based attention architecture, still‑nascent agent capabilities (multi‑turn interaction, long‑term memory, tool use), and occasional instruction‑following or identity‑recognition errors. Future work will address mixed‑attention designs, enhanced agent abilities, and reinforced identity alignment with safety fine‑tuning.

Resources, including model weights, demos, and code, are available on HuggingFace, ModelScope, and GitHub, with interactive chat interfaces for both domestic and international users.

AI LLM benchmark FP8 Scaling Inference

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.