7 min read

LongCat-Flash-Chat: 560B MoE Model with 27B Active Params Sets New Benchmarks

LongCat-Flash-Chat, an open‑source 560‑billion‑parameter Mixture‑of‑Experts model that activates only 18.6‑31.3 B parameters per token, delivers state‑of‑the‑art performance on general, agentic, coding, and instruction‑following benchmarks while offering fast inference and efficient deployment options.

Meituan Technology Team

Sep 1, 2025

LongCat-Flash-Chat: 560B MoE Model with 27B Active Params Sets New Benchmarks

Technical Highlights

LongCat-Flash adopts an innovative Mixture‑of‑Experts (MoE) architecture called Zero‑Computation Experts. The model contains a total of 560 B parameters, but each token activates only 18.6 B–31.3 B (average 27 B) parameters, achieving on‑demand compute allocation. A PID controller fine‑tunes expert bias during training to keep the average activation around 27 B.

The model introduces cross‑layer channels that parallelize MoE communication and computation, dramatically improving training and inference efficiency. Combined with custom low‑level optimizations, LongCat‑Flash completes training within 30 days and achieves inference speeds of over 100 tokens / s per user on H800 GPUs.

Training incorporates hyper‑parameter transfer and model stacking, along with multiple strategies to ensure stability, resulting in a smooth and efficient training process.

For agentic capabilities, LongCat‑Flash built a dedicated Agentic evaluation set and optimized the entire training pipeline, including multi‑agent generated diverse high‑quality trajectory data, leading to superior agent performance.

Performance Evaluation

General Knowledge : Achieves 86.50 on ArenaHard‑V2 (2nd place), 89.71 on MMLU, and 90.44 on CEval, rivaling leading domestic models despite a smaller parameter count.

Agentic Tool Use : Outperforms larger models on τ2‑Bench and ranks first with a score of 24.30 on VitaBench for complex scenarios.

Programming : Scores 39.51 on TerminalBench (2nd place) and 60.4 on SWE‑Bench‑Verified, demonstrating strong command‑line and software engineering abilities.

Instruction Following : Leads with 89.65 on IFEval and achieves top results on COLLIE (57.10) and Meeseeks‑zh (43.03), showing excellent compliance with complex multilingual instructions.

Model Deployment

Two efficient deployment solutions are provided based on SGLang and vLLM, enabling quick setup and experimentation.

python3 -m sglang.launch_server \
    --model meituan-longcat/LongCat-Flash-Chat-FP8 \
    --trust-remote-code \
    --attention-backend flashinfer \
    --enable-ep-moe \
    --tp 8

For detailed deployment instructions, refer to the LongCat‑Flash‑Chat repository.

Visit https://longcat.ai/ to start a conversation with LongCat‑Flash‑Chat.

Open Access

The source code is released under the MIT License, allowing users to leverage model outputs, perform model distillation, and train derivative models.

Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Flash-Chat

GitHub: https://github.com/meituan-longcat/LongCat-Flash-Chat

benchmark AI model Agentic AI LongCat-Flash-Chat Mixture-of-Experts

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.