Artificial Intelligence 14 min read

How LongCat‑Flash‑Thinking‑2601 Achieves Real‑World Generalization for Agents

LongCat‑Flash‑Thinking‑2601, a 560‑billion‑parameter MoE model, combines environment expansion, multi‑environment RL, systematic noise training, a heavy‑thinking reasoning mode, and Zigzag sparse attention to deliver strong benchmark performance and robust real‑world agent capabilities.

Meituan Technology Team

Jan 29, 2026

How LongCat‑Flash‑Thinking‑2601 Achieves Real‑World Generalization for Agents

Model Overview

LongCat-Flash-Thinking-2601 is a 560 billion‑parameter Mixture‑of‑Experts (MoE) model that achieves state‑of‑the‑art performance on agent benchmarks such as BrowseComp, τ²‑Bench and VitaBench. The model incorporates three core innovations: environment expansion, multi‑environment reinforcement‑learning (RL) training, and systematic noise‑robust training.

Training Paradigm: Two‑Extension + Noise Training

The training pipeline is built on three pillars:

Environment expansion – construction of a large‑scale arena covering more than 20 domains and tens of thousands of scenarios.

Reinforcement‑learning expansion – efficient, stable RL across millions of heterogeneous environments.

Noise‑robust training – systematic injection of real‑world disturbances (tool failures, incomplete results, ambiguous instructions) using a curriculum‑learning schedule.

Automated Environment Generation

An end‑to‑end generator creates a full‑stack environment graph from a concise domain definition. Each generated environment contains:

≈60 tools with complex dependency graphs.

Corresponding database schemas, tool‑call interfaces and validation logic.

The generation follows a “solvable‑path‑first” strategy:

Seed sampling of a long tool‑call chain as an anchor.

BFS‑style controlled expansion that guarantees logical consistency of dependencies.

Dynamic addition of new “golden” tool chains based on environment complexity.

Minimum‑scale guarantee (≥20 tools) to avoid under‑specified environments.

Asynchronous Training System DORA

DORA enables scalable training of the 560 B MoE model on the massive environment suite. Key components:

Multi‑version model‑parallel exploration : rollouts from different model versions are streamed into a sample queue immediately, removing the need to wait for all rollouts.

Distributed rollout architecture : a lightweight Rollout Manager plus multiple Rollout Controllers provide data‑parallel environment interaction.

Flexible environment deployment : extensions to PyTorch RPC allow remote function calls and object instantiation on any idle CPU.

Prefill‑Decode (PD) decoupling : pre‑fill and decode stages run on separate device groups, preventing long‑context pre‑fill from blocking decoding.

KV‑cache exchange : chunk‑level KV‑cache aggregation and asynchronous transfer reduce bandwidth and eliminate out‑of‑memory issues.

DORA also implements a two‑level resource‑balancing scheme (global difficulty‑based allocation and batch‑level domain diversity) that yields 2–4× higher throughput than synchronous training and supports stable training for thousands of steps on million‑scale heterogeneous environments.

Heavy‑Thinking (Re‑thinking) Mode

The model can operate in a “Heavy‑Thinking” mode that expands both reasoning width and depth. The process consists of:

Parallel generation of multiple reasoning paths (width).

A dedicated summarizer model evaluates and selects the most promising path.

Reinforcement learning integrates intermediate results to refine the final answer, effectively increasing reasoning depth.

This dual expansion improves performance on long‑chain reasoning, tool‑integration, and full‑agent scenarios, especially when additional inference budget is allocated.

Zigzag Sparse Attention

To handle ultra‑long sequences, LongCat introduces Zigzag Attention, a hybrid sparse pattern that combines Multi‑Head Latent Attention (MLA) and Streaming Sparse Attention (SSA). Each query token attends to:

A local window of the most recent W tokens (short‑range dependencies).

A set of global anchor tokens at the sequence start (long‑range memory).

The mechanism is inserted during mid‑training, converting a dense transformer into a sparse variant with negligible overhead. The resulting model supports up to 1 million tokens of context.

Repository:

https://huggingface.co/meituan-longcat/LongCat-Flash-Thinking-ZigZag

Benchmark Results

LongCat‑Flash‑Thinking‑2601 achieves top‑tier open‑source performance on BrowseComp, τ²‑Bench and VitaBench, and shows strong generalization on unseen tool combinations and noisy test sets, confirming the effectiveness of the noise‑training pipeline.

Release Resources

Code, model weights and the environment generation system are publicly released:

GitHub:

https://github.com/meituan-longcat/LongCat-Flash-Thinking-2601

Hugging Face:

https://huggingface.co/meituan-longcat/LongCat-Flash-Thinking-2601

ModelScope:

https://www.modelscope.cn/models/meituan-longcat/LongCat-Flash-Thinking-2601

open-source Large Language Model reinforcement learning agent training Environment Expansion Noise Robustness Zigzag Attention

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.