Breaking the Efficiency Wall: Ant Group’s Bailing Model Paves the Way to AGI
At CNCC 2025, Ant Group’s Vice President Zhou Jun outlined the Bailing large‑model’s five‑layer architecture, hybrid linear attention, Ling Scaling Law, and novel training algorithms that dramatically cut costs and latency, achieving state‑of‑the‑art performance on math and code benchmarks while promoting open‑source collaboration toward AGI.
Keynote Overview
On Oct 25 at the 22nd China Computer Conference (CNCC 2025), Zhou Jun, Vice President of Ant Group’s Platform Technology Business Group and head of the Bailing large‑model, delivered a keynote titled “Deconstructing the Path to General Intelligence with the Bailing Model”. He shared the model’s intelligent evolution, technical innovations, and called for collaborative innovation to tackle frontier research challenges.
Efficiency Challenges and Five‑Layer Architecture
Zhou described the “efficiency wall” faced by the industry: high model costs, long inference latency, and complex multimodal understanding. To break this wall, Bailing adopts an end‑to‑end, scalable intelligent infrastructure composed of five layers: application ecosystem → language/multimodal large model → trillion‑parameter “Wan‑Card” clusters → (hybrid) linear attention engine → data and compute infrastructure. Each layer aims to improve performance and efficiency.
Ling Scaling Law and Sparse MoE
The team introduced the Ling Scaling Law, which leverages a highly sparse Mixture‑of‑Experts (MoE) architecture with fine‑grained expert routing and control strategies. This approach raises the intelligence ceiling while significantly reducing compute cost and inference delay, forming the core technical philosophy of Bailing’s model development.
Core Technical Innovations
Three major innovations were highlighted:
Hybrid Linear Attention Architecture : combines the computational efficiency of linear attention with the precision of traditional attention.
Inference Efficiency and Accuracy Optimization : trains models with high‑quality data to endow intrinsic reasoning ability, and introduces an Evolutionary Chain of Thought (Evo CoT) mechanism to extend the Pareto frontier of accuracy and average inference length.
Stability‑Focused Reinforcement Learning Algorithms : includes the LPO (Linguistics‑Unit Policy Optimization) algorithm, IcePop algorithm, and C3PO++ framework, which improve semantic coherence, mitigate distribution shift, and boost GPU utilization and training throughput.
Empirical Results
The trillion‑parameter non‑thinking model Ling‑1T outperformed Gemini 2.5‑Pro on the AIME 2025 benchmark, reducing token usage per inference by 60 %. The trillion‑parameter thinking model Ring‑1T achieved leading open‑source performance on code generation and logical reasoning benchmarks and earned a silver‑medal level on IMO problems.
Future Directions and Open‑Source Efforts
Zhou emphasized that achieving AGI requires deep multimodal understanding and interaction across visual, audio, and 3D domains. He cited the open‑source model Ming‑Flash‑Omni‑Preview, which excels on the 12‑task ContextASR benchmark, representing the largest known open‑source multimodal model.
To date, Ant Group’s Bailing has open‑sourced over 18 large models, spanning language models (Ling series), reasoning models (Ring series), multimodal models (Ming series), and exploratory diffusion models (LLaDA), covering scales from billions to trillions of parameters.
Engineering Culture and Collaboration
In a separate talk, Zhou highlighted the importance of open collaboration as the core gene of engineering culture, urging researchers to build on Bailing’s open‑source成果 to collectively explore AGI possibilities.
References
Towards Greater Leverage: Scaling Laws for Efficient Mixture‑of‑Experts Language Models. https://arxiv.org/abs/2507.17702
Every Attention Matters: An Efficient Hybrid Architecture for Long‑Context Reasoning. https://arxiv.org/abs/2510.19338
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation.
Every Step Evolves: Scaling Reinforcement Learning for Trillion‑Scale Thinking Model. https://arxiv.org/abs/2510.18855
Ming‑Flash‑Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
