Meituan Technology Team
Sep 11, 2025 · Artificial Intelligence
How LongCat-Flash Achieves Ultra-Fast, Low-Cost AI Agent Inference with SGLang
LongCat-Flash, an open‑source Mixture‑of‑Experts model released by Meituan, leverages model‑system co‑design, PD‑disaggregation, SBO scheduling and large‑scale expert parallelism within the SGLang framework to deliver dramatically lower latency, higher throughput and cost‑effective inference for AI agents, with detailed deployment instructions provided.
LongCat-FlashMixture of ExpertsModel Inference
0 likes · 15 min read
