LongCat-Flash-Chat: 560B MoE Model with 27B Active Params Sets New Benchmarks
LongCat-Flash-Chat, an open‑source 560‑billion‑parameter Mixture‑of‑Experts model that activates only 18.6‑31.3 B parameters per token, delivers state‑of‑the‑art performance on general, agentic, coding, and instruction‑following benchmarks while offering fast inference and efficient deployment options.
Technical Highlights
LongCat-Flash adopts an innovative Mixture‑of‑Experts (MoE) architecture called Zero‑Computation Experts. The model contains a total of 560 B parameters, but each token activates only 18.6 B–31.3 B (average 27 B) parameters, achieving on‑demand compute allocation. A PID controller fine‑tunes expert bias during training to keep the average activation around 27 B.
The model introduces cross‑layer channels that parallelize MoE communication and computation, dramatically improving training and inference efficiency. Combined with custom low‑level optimizations, LongCat‑Flash completes training within 30 days and achieves inference speeds of over 100 tokens / s per user on H800 GPUs.
Training incorporates hyper‑parameter transfer and model stacking, along with multiple strategies to ensure stability, resulting in a smooth and efficient training process.
For agentic capabilities, LongCat‑Flash built a dedicated Agentic evaluation set and optimized the entire training pipeline, including multi‑agent generated diverse high‑quality trajectory data, leading to superior agent performance.
Performance Evaluation
General Knowledge : Achieves 86.50 on ArenaHard‑V2 (2nd place), 89.71 on MMLU, and 90.44 on CEval, rivaling leading domestic models despite a smaller parameter count.
Agentic Tool Use : Outperforms larger models on τ2‑Bench and ranks first with a score of 24.30 on VitaBench for complex scenarios.
Programming : Scores 39.51 on TerminalBench (2nd place) and 60.4 on SWE‑Bench‑Verified, demonstrating strong command‑line and software engineering abilities.
Instruction Following : Leads with 89.65 on IFEval and achieves top results on COLLIE (57.10) and Meeseeks‑zh (43.03), showing excellent compliance with complex multilingual instructions.
Model Deployment
Two efficient deployment solutions are provided based on SGLang and vLLM, enabling quick setup and experimentation.
python3 -m sglang.launch_server \
--model meituan-longcat/LongCat-Flash-Chat-FP8 \
--trust-remote-code \
--attention-backend flashinfer \
--enable-ep-moe \
--tp 8For detailed deployment instructions, refer to the LongCat‑Flash‑Chat repository.
Visit https://longcat.ai/ to start a conversation with LongCat‑Flash‑Chat.
Open Access
The source code is released under the MIT License, allowing users to leverage model outputs, perform model distillation, and train derivative models.
Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
GitHub: https://github.com/meituan-longcat/LongCat-Flash-Chat
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
