LongCat-Flash-Thinking: The New SOTA Open-Source LLM for Deep Reasoning and Tool Use
Meituan’s LongCat team unveiled LongCat-Flash-Thinking, an open‑source large language model that combines deep logical reasoning with tool‑calling capabilities, achieving state‑of‑the‑art performance across logic, mathematics, code, and agentic tasks, and introducing novel training frameworks such as domain‑parallel RL and DORA.
Today the Meituan LongCat team officially released LongCat-Flash-Thinking, a high‑efficiency inference model that retains the extreme speed of LongCat-Flash-Chat while offering stronger, more professional reasoning capabilities. It achieves state‑of‑the‑art performance among open‑source models in logic, mathematics, code, and agentic tasks.
Domain‑Parallel Reinforcement Learning Training
To address stability issues in mixed‑training reinforcement learning, a domain‑parallel scheme decouples optimization for STEM, code, and agentic tasks. This multi‑domain parallel training followed by fusion balances model capabilities and achieves Pareto‑optimal overall performance.
Asynchronous Elastic Co‑Card System (DORA)
The DORA system underpins training, featuring Elastic Colocation scheduling and a Multi‑Version Asynchronous Pipeline. It delivers up to three‑fold speed‑up over synchronous RL frameworks while preserving policy consistency per sample, and supports efficient KV‑cache reuse for clusters with thousands of cards.
Agentic Reasoning Framework
A novel “dual‑path reasoning” framework enables the model to autonomously select optimal query samples and integrate tool usage (e.g., code executors, APIs) into its reasoning process. In AIME25 tests, the model achieved 90% accuracy while reducing token usage by 64.5% (from 19,653 to 6,965 tokens).
Formal Reasoning Framework
To overcome limitations of open‑source LLMs in formal proof tasks, a new data‑synthesis pipeline based on an expert‑iteration framework with an integrated Lean4 server generates rigorously verified proofs, substantially enhancing formal reasoning reliability.
Performance Highlights
General Reasoning: Scores 50.3 on ARC‑AGI, surpassing top closed‑source models such as OpenAI o3 and Gemini2.5 Pro.
Mathematics: Breakthrough results on HMMT and AIME benchmarks, matching or exceeding leading models like Qwen3‑235B‑A22B‑Thinking.
Code: Achieves 79.4 on LiveCodeBench, comparable to GPT‑5, and 40.7 on OJBench, close to Gemini2.5‑Pro.
Agentic Ability: Sets open‑source SOTA on τ2‑Bench with 74.0 points and excels on SWE‑Bench, BFCL V3, and VitaBench.
ATP Formal Reasoning: Attains 67.6 % pass@1 on MiniF2F‑test, leading all evaluated models.
LongCat-Flash-Thinking is fully open‑source on Hugging Face and GitHub:
Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Flash-Thinking
Github: https://github.com/meituan-longcat/LongCat-Flash-Thinking
Meituan Technology Team
Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
