Artificial Intelligence 8 min read

Breaking the Efficiency Wall: Ant Group’s Bailing Model Paves the Way to AGI

At CNCC 2025, Ant Group’s Vice President Zhou Jun outlined the Bailing large‑model’s five‑layer architecture, hybrid linear attention, Ling Scaling Law, and novel training algorithms that dramatically cut costs and latency, achieving state‑of‑the‑art performance on math and code benchmarks while promoting open‑source collaboration toward AGI.

AntTech

Nov 11, 2025

Breaking the Efficiency Wall: Ant Group’s Bailing Model Paves the Way to AGI

Keynote Overview

On Oct 25 at the 22nd China Computer Conference (CNCC 2025), Zhou Jun, Vice President of Ant Group’s Platform Technology Business Group and head of the Bailing large‑model, delivered a keynote titled “Deconstructing the Path to General Intelligence with the Bailing Model”. He shared the model’s intelligent evolution, technical innovations, and called for collaborative innovation to tackle frontier research challenges.

Efficiency Challenges and Five‑Layer Architecture

Zhou described the “efficiency wall” faced by the industry: high model costs, long inference latency, and complex multimodal understanding. To break this wall, Bailing adopts an end‑to‑end, scalable intelligent infrastructure composed of five layers: application ecosystem → language/multimodal large model → trillion‑parameter “Wan‑Card” clusters → (hybrid) linear attention engine → data and compute infrastructure. Each layer aims to improve performance and efficiency.

Ling Scaling Law and Sparse MoE

The team introduced the Ling Scaling Law, which leverages a highly sparse Mixture‑of‑Experts (MoE) architecture with fine‑grained expert routing and control strategies. This approach raises the intelligence ceiling while significantly reducing compute cost and inference delay, forming the core technical philosophy of Bailing’s model development.

Core Technical Innovations

Three major innovations were highlighted:

Hybrid Linear Attention Architecture : combines the computational efficiency of linear attention with the precision of traditional attention.

Inference Efficiency and Accuracy Optimization : trains models with high‑quality data to endow intrinsic reasoning ability, and introduces an Evolutionary Chain of Thought (Evo CoT) mechanism to extend the Pareto frontier of accuracy and average inference length.

Stability‑Focused Reinforcement Learning Algorithms : includes the LPO (Linguistics‑Unit Policy Optimization) algorithm, IcePop algorithm, and C3PO++ framework, which improve semantic coherence, mitigate distribution shift, and boost GPU utilization and training throughput.

Empirical Results

The trillion‑parameter non‑thinking model Ling‑1T outperformed Gemini 2.5‑Pro on the AIME 2025 benchmark, reducing token usage per inference by 60 %. The trillion‑parameter thinking model Ring‑1T achieved leading open‑source performance on code generation and logical reasoning benchmarks and earned a silver‑medal level on IMO problems.

Future Directions and Open‑Source Efforts

Zhou emphasized that achieving AGI requires deep multimodal understanding and interaction across visual, audio, and 3D domains. He cited the open‑source model Ming‑Flash‑Omni‑Preview, which excels on the 12‑task ContextASR benchmark, representing the largest known open‑source multimodal model.

To date, Ant Group’s Bailing has open‑sourced over 18 large models, spanning language models (Ling series), reasoning models (Ring series), multimodal models (Ming series), and exploratory diffusion models (LLaDA), covering scales from billions to trillions of parameters.

Engineering Culture and Collaboration

In a separate talk, Zhou highlighted the importance of open collaboration as the core gene of engineering culture, urging researchers to build on Bailing’s open‑source成果 to collectively explore AGI possibilities.

References

Towards Greater Leverage: Scaling Laws for Efficient Mixture‑of‑Experts Language Models. https://arxiv.org/abs/2507.17702

Every Attention Matters: An Efficient Hybrid Architecture for Long‑Context Reasoning. https://arxiv.org/abs/2510.19338

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation.

Every Step Evolves: Scaling Reinforcement Learning for Trillion‑Scale Thinking Model. https://arxiv.org/abs/2510.18855

Ming‑Flash‑Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multimodal AI large language models Mixture of Experts AGI scaling laws

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.