How Meituan’s LongCat‑Flash‑Chat Beats Top LLMs with Zero‑Computation Experts

LongCat‑Flash‑Chat, Meituan’s newly open‑sourced 560B MoE model, outperforms leading LLMs on agent tool use and instruction following benchmarks, introduces zero‑computation experts and shortcut‑connected MoE for higher throughput, and demonstrates strong programming and reasoning abilities across diverse evaluation tasks.

Java Tech Enthusiast
Java Tech Enthusiast
Java Tech Enthusiast
How Meituan’s LongCat‑Flash‑Chat Beats Top LLMs with Zero‑Computation Experts

LongCat‑Flash‑Chat is Meituan’s first open‑source large language model, released as a 560B Mixture‑of‑Experts (MoE) architecture that quickly gained attention in the global AI community.

On several benchmarks, including agent tool invocation and instruction following, it surpasses DeepSeek‑V3.1, Qwen3 MoE‑2507, and even the closed‑source Claude‑4 Sonnet. Its programming performance on TerminalBench is comparable to Claude‑4 Sonnet.

Zero‑Computation Experts Boost Throughput

The model employs a “Zero‑Computation Experts” design combined with a Shortcut‑connected MoE. For each token, 18.6B–31.3B parameters are dynamically activated based on contextual importance, while a set of identity experts bypass GEMM operations, reducing computation.

Shortcut‑connected MoE overlaps dense‑FFN computation with MoE dispatch/combine communication, widening the compute‑communication overlap window and significantly increasing training and inference throughput.

Training and Scaling Strategies

LongCat uses a hybrid strategy of hyper‑parameter migration, model‑growth initialization, multi‑stability suites, and deterministic computation. A small model first predicts optimal hyper‑parameters, then a 14‑layer model is stacked into 28 checkpointed layers to accelerate convergence.

Pre‑training on 20 T tokens, followed by a second stage that expands the context window to 128k and a final stage using multi‑agent synthesis for complex tool‑use tasks, equips the model with advanced agentic behavior.

Inference Optimizations

A multi‑step overlapping scheduler pre‑schedules future steps, interleaving CPU scheduling with GPU computation, enabling substantial throughput gains for a 560B model.

Evaluation Results

On a range of authoritative benchmarks, LongCat‑Flash ranks in the top tier, matching or exceeding DeepSeek‑V3.1 on non‑thinking tasks. It achieves higher single‑GPU throughput and per‑user speed across different context lengths.

Training on tens of thousands of accelerator cards completed over 20 T tokens in 30 days with 98.48% utilization; a single H800 GPU generates over 100 tokens/s at a cost of about $0.7 per million output tokens.

Real‑World Performance

LongCat correctly solved a full set of challenging mathematics problems from a national exam, performed precise geometric reasoning, and generated accurate SVG code for a Calvin‑cycle illustration.

In a “Misguided Attention” benchmark, it identified a trick question about Schrödinger’s cat, correctly stating that a dead cat cannot be alive, while other models fell for the trap.

Meituan’s AI Strategy

Meituan’s AI roadmap includes three pillars: AI at Work (boosting employee efficiency), AI in Products (creating native AI applications), and Building LLM (continuous investment in large‑model research). The LongCat model powers internal AI tools such as AI programming assistants, smart meetings, and document helpers, with API usage growing from 10% to 68% year‑over‑year.

Meituan’s 2024 R&D spending of ¥211 billion ranks just behind Huawei, Tencent, and Alibaba, supporting both core business AI and broader innovations like autonomous delivery and robotics.

For more information, the model can be accessed at https://longcat.chat , with code and checkpoints on Hugging Face and GitHub.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

large language modelModel architecturebenchmark performanceMeituan AIZero Computation Experts
Java Tech Enthusiast
Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.