Artificial Intelligence 7 min read

LongCat-Flash-Thinking: The New SOTA Open-Source LLM for Deep Reasoning and Tool Use

Meituan’s LongCat team unveiled LongCat-Flash-Thinking, an open‑source large language model that combines deep logical reasoning with tool‑calling capabilities, achieving state‑of‑the‑art performance across logic, mathematics, code, and agentic tasks, and introducing novel training frameworks such as domain‑parallel RL and DORA.

Meituan Technology Team

Sep 22, 2025

LongCat-Flash-Thinking: The New SOTA Open-Source LLM for Deep Reasoning and Tool Use

Today the Meituan LongCat team officially released LongCat-Flash-Thinking, a high‑efficiency inference model that retains the extreme speed of LongCat-Flash-Chat while offering stronger, more professional reasoning capabilities. It achieves state‑of‑the‑art performance among open‑source models in logic, mathematics, code, and agentic tasks.

Domain‑Parallel Reinforcement Learning Training

To address stability issues in mixed‑training reinforcement learning, a domain‑parallel scheme decouples optimization for STEM, code, and agentic tasks. This multi‑domain parallel training followed by fusion balances model capabilities and achieves Pareto‑optimal overall performance.

Asynchronous Elastic Co‑Card System (DORA)

The DORA system underpins training, featuring Elastic Colocation scheduling and a Multi‑Version Asynchronous Pipeline. It delivers up to three‑fold speed‑up over synchronous RL frameworks while preserving policy consistency per sample, and supports efficient KV‑cache reuse for clusters with thousands of cards.

Agentic Reasoning Framework

A novel “dual‑path reasoning” framework enables the model to autonomously select optimal query samples and integrate tool usage (e.g., code executors, APIs) into its reasoning process. In AIME25 tests, the model achieved 90% accuracy while reducing token usage by 64.5% (from 19,653 to 6,965 tokens).

Formal Reasoning Framework

To overcome limitations of open‑source LLMs in formal proof tasks, a new data‑synthesis pipeline based on an expert‑iteration framework with an integrated Lean4 server generates rigorously verified proofs, substantially enhancing formal reasoning reliability.

Performance Highlights

General Reasoning: Scores 50.3 on ARC‑AGI, surpassing top closed‑source models such as OpenAI o3 and Gemini2.5 Pro.

Mathematics: Breakthrough results on HMMT and AIME benchmarks, matching or exceeding leading models like Qwen3‑235B‑A22B‑Thinking.

Code: Achieves 79.4 on LiveCodeBench, comparable to GPT‑5, and 40.7 on OJBench, close to Gemini2.5‑Pro.

Agentic Ability: Sets open‑source SOTA on τ2‑Bench with 74.0 points and excels on SWE‑Bench, BFCL V3, and VitaBench.

ATP Formal Reasoning: Attains 67.6 % pass@1 on MiniF2F‑test, leading all evaluated models.

LongCat-Flash-Thinking is fully open‑source on Hugging Face and GitHub:

Hugging Face: https://huggingface.co/meituan-longcat/LongCat-Flash-Thinking

Github: https://github.com/meituan-longcat/LongCat-Flash-Thinking

AI large language model benchmark reasoning tool use

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.