Artificial Intelligence 24 min read

Inside Ant’s Baoling: Balancing Efficiency and Reasoning in a 1‑Trillion‑Parameter Model

At the Ant Star Innovation Journey event, the Baoling team unveiled their roadmap for trillion‑parameter models, detailing the development of Ling‑1T, Ring‑1T and multimodal Ming series, the scaling‑law‑guided architecture, training innovations, evaluation methods, and open‑source releases that aim to advance efficient, high‑performance AI.

AntTech

Oct 29, 2025

Inside Ant’s Baoling: Balancing Efficiency and Reasoning in a 1‑Trillion‑Parameter Model

1. Baoling Model Layout and Thoughts

On October 27, at the "Ant Star·Innovation Journey" event in Beijing, the Ant Baoling team appeared for the first time together with more than 60 outstanding PhD students from top Chinese universities. They shared the evolution of the Baoling foundational large model, the research insights behind the trillion‑parameter models Ling‑1T and Ring‑1T, and the team’s talent programs.

The participants are candidates of Ant’s elite talent projects “Ant Star” and “Plan A”, mainly from computer science, software engineering, artificial intelligence, and network security. “Ant Star” focuses on AI, privacy computing, and security technologies, while “Plan A” (starting in 2025) recruits world‑class AI researchers.

The name “Baoling” comes from the lark bird, symbolizing trial and success. When the team embraced open source, they chose the international name “Ling” (the first four letters of “linguistic”) for the base and instruction models. Replacing the initial “L” with “R” yields the reasoning‑focused “Ring” series, and with “M” produces the multimodal “Ming” series.

Baoling’s goal is AGI – a general artificial intelligence – rather than chasing specific benchmark rankings. The architecture is a scalable, end‑to‑end intelligent foundation stack, comprising an application ecosystem layer, language and multimodal model layers, trillion‑parameter model clusters, linear‑attention engines, and supporting data and compute infrastructure.

To date, Baoling has open‑sourced more than 18 large models ranging from hundred‑billion to trillion‑parameter scales. Notable releases include:

Ling‑plus : the first MoE model achieving high performance on modest hardware.

Ling‑1.5 series

Ling 2.0 series , featuring Ling‑mini‑2.0 (mobile‑friendly), Ling‑flash‑2.0 (efficient 40B‑dense performance with 6.1B activations), and Ling‑1T (a trillion‑parameter flagship with superior inference accuracy and efficiency).

Ring series (thinking models), including Ring‑flash‑linear‑2.0 and Ring‑mini‑linear‑2.0, and the world‑first open‑source trillion‑parameter thinking model Ring‑1T.

Ming series (multimodal), with Ming‑lite‑omni‑1.0 and Ming‑1.5 achieving “see, hear, speak, draw” capabilities comparable to GPT‑4o; the newly open‑sourced Ming‑flash‑omni‑Preview is the largest open‑source full‑modal model.

LLaDA family, exploring diffusion‑based language models; the first open‑source diffusion language model and a small‑size LLaDA‑MoE variant.

2. Baoling Flagship Model Ling‑1T: Balancing Efficient Thinking and Precise Reasoning

Model intelligence depends on latency and capacity. While scaling laws dictate that larger parameters increase knowledge capacity, inference latency must remain controllable for practical deployment. Baoling therefore adopts an efficient sparse MoE architecture guided by a self‑derived scaling law, aiming to raise the intelligence ceiling while sharply reducing inference cost.

Key technical details of Ling‑1T include:

Architecture designed from the start to support trillion‑scale models, with a scaling law that informs architecture and hyper‑parameter choices.

Three size variants (mini, flash, 1T); Ling‑1T uses 50 B activation parameters and FP8 mixed‑precision training, making it the largest known FP8‑trained base model.

Training on over 20 T high‑quality tokens; the second stage of pre‑training adds dense reasoning data, raising the proportion of reasoning tokens to >40 %.

Three major insights from Ling‑1T development:

Robust evaluation is essential. Evaluation is integrated throughout data collection, pre‑training, and model alignment, providing continuous feedback for data quality and model improvement. Evaluation methods evolve alongside model capabilities, shifting from simple leaderboard scores to more nuanced metrics.

SFT is the foundation for RL. A high‑quality base checkpoint enables effective reinforcement learning. Introducing chain‑of‑thought reasoning data during pre‑training “pre‑activates” inference ability, and SFT data must match the base model’s capacity to unlock RL potential.

Co‑optimizing thinking and non‑thinking models. Ling‑1T achieves industry‑leading performance on benchmarks such as AIME 2025, surpassing Gemini 2.5 Pro while reducing token usage by 60 %.

During post‑training, Baoling introduced the Evo‑CoT approach, a mixed‑task RL curriculum, and a novel ApexEval metric that prioritizes maximum‑pooling over averaging to better capture the potential of large models.

3. Trillion‑Parameter Thinking Model Ring‑1T: Three Technical Innovations

Ring‑1T leads open‑source performance on math, code generation, medical, and creative writing benchmarks, even earning a silver medal on IMO 2025.

The two core challenges for trillion‑scale thinking models are stability and efficiency. Baoling addresses them with three innovations:

ASystem : a high‑performance RL system (including the open‑source AReaL), offering fast memory management, weight swapping, and sandbox verification, enabling stable training of Ring‑1T.

IcePop : a method that masks gradients of tokens whose training‑inference discrepancy exceeds predefined bounds, preventing destabilizing updates during long‑context training.

C3PO++ : a budget‑controlled sampling and training strategy that mitigates long‑tail rollout variance, stabilizing throughput and achieving up to 2.5× inference acceleration and 1.5× training speed‑up without sacrificing reward.

4. On‑Site Interaction

During the event, the Baoling team answered audience questions about the motivation for trillion‑parameter models, scaling‑law research, training experiences, and the exploration of diffusion architectures such as LLaDA. They emphasized open collaboration, the importance of multimodal research, and the need for new evaluation benchmarks for full‑modal AI.